Unit tests for stochastic optimization
Abstract: Optimization by stochastic gradient descent is an important component of many
large-scale machine learning algorithms. A wide variety of such optimization algorithms
have been devised; however, it is unclear whether these algorithms are robust and widely ...
large-scale machine learning algorithms. A wide variety of such optimization algorithms
have been devised; however, it is unclear whether these algorithms are robust and widely ...
[CITATION][C] A deep and tractable density estimator
BUI Murray, H Larochelle - 2014 - ICML
[PDF][PDF] Training bidirectional Helmholtz machines
Abstract Unsupervised training of deep generative models containing latent variables and
performing inference remains a challenging problem for complex, high dimensional
distributions. One basic approach to this problem is the so called Helmholtz machine and ...
performing inference remains a challenging problem for complex, high dimensional
distributions. One basic approach to this problem is the so called Helmholtz machine and ...
Cited by 1 Related articles All 3 versions Cite SaveSaving...Error saving. Try again? More View as HTML Fewer
Constructing and Mining Web-Scale Knowledge Graphs: WWW 2015 Tutorial
A Bordes, E Gabrilovich - … of the 24th International Conference on World …, 2015 - dl.acm.org
Abstract Recent years have witnessed a proliferation of large-scale knowledge graphs, such
as Freebase, Google's Knowledge Graph, YAGO, Facebook's Entity Graph, and Microsoft's
Satori. Whereas there is a large body of research on mining homogeneous graphs, this ...
as Freebase, Google's Knowledge Graph, YAGO, Facebook's Entity Graph, and Microsoft's
Satori. Whereas there is a large body of research on mining homogeneous graphs, this ...
Cited by 2 Related articles Cite SaveSaving...Error saving. Try again? More EBSCOhost Full Text Fewer
[CITATION][C] Binarized mnist dataset
H Larochelle - 2011
Matrix-free approximate equilibration
AM Bradley, W Murray - arXiv preprint arXiv:1110.2805, 2011 - arxiv.org
Abstract: The condition number of a diagonally scaled matrix, for appropriately chosen
scaling matrices, is often less than that of the original. Equilibration scales a matrix so that
the scaled matrix's row and column norms are equal. Scaling can be approximate. We ...
scaling matrices, is often less than that of the original. Equilibration scales a matrix so that
the scaled matrix's row and column norms are equal. Scaling can be approximate. We ...
Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial
A Bordes, E Gabrilovich - Proceedings of the 20th ACM SIGKDD …, 2014 - dl.acm.org
Abstract Recent years have witnessed a proliferation of large-scale knowledge graphs, such
as Freebase, YAGO, Google's Knowledge Graph, and Microsoft's Satori. Whereas there is a
large body of research on mining homogeneous graphs, this new generation of ...
as Freebase, YAGO, Google's Knowledge Graph, and Microsoft's Satori. Whereas there is a
large body of research on mining homogeneous graphs, this new generation of ...
Cited by 15 Related articles All 3 versions Cite SaveSaving...Error saving. Try again? More EBSCOhost Full Text Fewer
[PDF][PDF] No more pesky learning rates.
Abstract The performance of stochastic gradient descent (SGD) depends critically on how
learning rates are tuned and decreased over time. We propose a method to automatically
adjust multiple learning rates so as to minimize the expected error at any one time. The ...
learning rates are tuned and decreased over time. We propose a method to automatically
adjust multiple learning rates so as to minimize the expected error at any one time. The ...
Cited by 113 Related articles All 24 versions Cite SaveSaving...Error saving. Try again? More View as HTML Fewer
Deep unsupervised learning using nonequilibrium thermodynamics
J Sohl-Dickstein, EA Weiss… - arXiv preprint arXiv: …, 2015 - arxiv.org
Abstract: A central problem in machine learning involves modeling complex data-sets using
highly flexible families of probability distributions in which learning, sampling, inference, and
evaluation are still analytically or computationally tractable. Here, we develop an ...
highly flexible families of probability distributions in which learning, sampling, inference, and
evaluation are still analytically or computationally tractable. Here, we develop an ...
[CITATION][C] Müller, Klaus-Robert. Efficient backprop
YA LeCun, L Bottou, GB Orr - Neural networks: Tricks of the trade