Authors
Andrew Senior, Georg Heigold, Ke Yang
Publication date
2013/5/26
Conference
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Pages
6724-6728
Publisher
IEEE
Description
ABSTRACT Recent deep neural network systems for large vocabulary speech recognition
are trained with minibatch stochastic gradient descent but use a variety of learning rate
scheduling schemes. We investigate several of these schemes, particularly AdaGrad. Based
on our analysis of its limitations, we propose a new variant 'AdaDec'that decouples long-
term learning-rate scheduling from per-parameter learning rate variation. AdaDec was found
to result in higher frame accuracies than other methods. Overall, careful choice of learning ...
Total citations
20132014201520165171613
Scholar articles
A Senior, G Heigold, K Yang - 2013 IEEE International Conference on Acoustics, …, 2013