Authors
Tara N Sainath, Ron J Weiss, Andrew Senior, Kevin W Wilson, Oriol Vinyals
Publication date
2015/9
Journal
Proc. Interspeech
Description
Abstract Learning an acoustic model directly from the raw waveform has been an active area
of research. However, waveformbased models have not yet matched the performance of
logmel trained neural networks. We will show that raw waveform features match the
performance of log-mel filterbank energies when used with a state-of-the-art CLDNN
acoustic model trained on over 2,000 hours of speech. Specifically, we will show the benefit
of the CLDNN, namely the time convolution layer in reducing temporal variations, the ...
of research. However, waveformbased models have not yet matched the performance of
logmel trained neural networks. We will show that raw waveform features match the
performance of log-mel filterbank energies when used with a state-of-the-art CLDNN
acoustic model trained on over 2,000 hours of speech. Specifically, we will show the benefit
of the CLDNN, namely the time convolution layer in reducing temporal variations, the ...
Total citations
Scholar articles
TN Sainath, RJ Weiss, A Senior, KW Wilson, O Vinyals - Proc. Interspeech, 2015
Dates and citation counts are estimated and are determined automatically by a computer program.