Authors
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson
Publication date
2013/12/11
Journal
arXiv preprint arXiv:1312.3005
Description
Abstract: We propose a new benchmark corpus to be used for measuring progress in
statistical language modeling. With almost one billion words of training data, we hope this
benchmark will be useful to quickly evaluate novel language modeling techniques, and to
compare their contribution when combined with other advanced techniques. We show
performance of several well-known types of language models, with the best results achieved
with a recurrent neural network based language model. The baseline unpruned Kneser- ...
statistical language modeling. With almost one billion words of training data, we hope this
benchmark will be useful to quickly evaluate novel language modeling techniques, and to
compare their contribution when combined with other advanced techniques. We show
performance of several well-known types of language models, with the best results achieved
with a recurrent neural network based language model. The baseline unpruned Kneser- ...
Total citations
Scholar articles
C Chelba, T Mikolov, M Schuster, Q Ge, T Brants… - arXiv preprint arXiv:1312.3005, 2013
Dates and citation counts are estimated and are determined automatically by a computer program.