Authors
David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever
Publication date
2013/12/16
Journal
arXiv preprint arXiv:1312.4314
Description
Abstract: Mixtures of Experts combine the outputs of several" expert" networks, each of which
specializes in a different part of the input space. This is achieved by training a" gating"
network that maps each input to a distribution over the experts. Such models show promise
for building larger networks that are still cheap to compute at test time, and more
parallelizable at training time. In this this work, we extend the Mixture of Experts to a stacked
model, the Deep Mixture of Experts, with multiple sets of gating and experts. This ...
Total citations
201420152016111
Scholar articles
D Eigen, MA Ranzato, I Sutskever - arXiv preprint arXiv:1312.4314, 2013