1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns F Seide, H Fu, J Droppo, G Li, D Yu (InterSpeech) Fifteenth Annual Conference of the International Speech …, 2014 | 1051 | 2014 |
Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing H Fu, C Li, X Liu, J Gao, A Celikyilmaz, L Carin (NAACL) Proceedings of the 2019 Conference of the North American Chapter of …, 2019 | 380 | 2019 |
On parallelizability of stochastic gradient descent for speech dnns F Seide, H Fu, J Droppo, G Li, D Yu (ICASSP) 2014 IEEE International Conference on Acoustics, Speech and Signal …, 2014 | 98 | 2014 |
Acoustics, Speech and Signal Processing (ICASSP) F Seide, H Fu, J Droppo, G Li, D Yu 2014 IEEE International Conference on. IEEE, 235-239, 2014 | 57* | 2014 |
Improving text generation with student-forcing optimal transport J Li, C Li, G Wang, H Fu, Y Lin, L Chen, Y Zhang, C Tao, R Zhang, ... Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020 | 14 | 2020 |
Flexible Text Modeling with Semi-Implicit Latent Representations H Fu, C Li, K Bai, J Gao, L Carin Neural Information Processing Systems: Bayesian Deep Learning NeurIPS 2019 …, 2019 | | 2019 |