Authors
Andrew Senior, Georg Heigold, Ke Yang
Publication date
2013/5/26
Conference
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Pages
6724-6728
Publisher
IEEE
Description
ABSTRACT Recent deep neural network systems for large vocabulary speech recognition
are trained with minibatch stochastic gradient descent but use a variety of learning rate
scheduling schemes. We investigate several of these schemes, particularly AdaGrad. Based
on our analysis of its limitations, we propose a new variant 'AdaDec'that decouples long-
term learning-rate scheduling from per-parameter learning rate variation. AdaDec was found
to result in higher frame accuracies than other methods. Overall, careful choice of learning ...
are trained with minibatch stochastic gradient descent but use a variety of learning rate
scheduling schemes. We investigate several of these schemes, particularly AdaGrad. Based
on our analysis of its limitations, we propose a new variant 'AdaDec'that decouples long-
term learning-rate scheduling from per-parameter learning rate variation. AdaDec was found
to result in higher frame accuracies than other methods. Overall, careful choice of learning ...
Total citations
Scholar articles
A Senior, G Heigold, K Yang - 2013 IEEE International Conference on Acoustics, …, 2013
Dates and citation counts are estimated and are determined automatically by a computer program.