Unit tests for stochastic optimization

T Schaul, I Antonoglou, D Silver - arXiv preprint arXiv:1312.6055, 2013 - arxiv.org
Abstract: Optimization by stochastic gradient descent is an important component of many
large-scale machine learning algorithms. A wide variety of such optimization algorithms
have been devised; however, it is unclear whether these algorithms are robust and widely ...

[CITATION][C] A deep and tractable density estimator

BUI Murray, H Larochelle - 2014 - ICML

[PDF][PDF] Training bidirectional Helmholtz machines

J Bornschein, S Shabanian, A Fischer, Y Bengio - 2015 - pdfs.semanticscholar.org
Abstract Unsupervised training of deep generative models containing latent variables and
performing inference remains a challenging problem for complex, high dimensional
distributions. One basic approach to this problem is the so called Helmholtz machine and ...

Constructing and Mining Web-Scale Knowledge Graphs: WWW 2015 Tutorial

A Bordes, E Gabrilovich - … of the 24th International Conference on World …, 2015 - dl.acm.org
Abstract Recent years have witnessed a proliferation of large-scale knowledge graphs, such
as Freebase, Google's Knowledge Graph, YAGO, Facebook's Entity Graph, and Microsoft's
Satori. Whereas there is a large body of research on mining homogeneous graphs, this ...

[CITATION][C] Binarized mnist dataset

H Larochelle - 2011

Matrix-free approximate equilibration

AM Bradley, W Murray - arXiv preprint arXiv:1110.2805, 2011 - arxiv.org
Abstract: The condition number of a diagonally scaled matrix, for appropriately chosen
scaling matrices, is often less than that of the original. Equilibration scales a matrix so that
the scaled matrix's row and column norms are equal. Scaling can be approximate. We ...

Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial

A Bordes, E Gabrilovich - Proceedings of the 20th ACM SIGKDD …, 2014 - dl.acm.org
Abstract Recent years have witnessed a proliferation of large-scale knowledge graphs, such
as Freebase, YAGO, Google's Knowledge Graph, and Microsoft's Satori. Whereas there is a
large body of research on mining homogeneous graphs, this new generation of ...

[PDF][PDF] No more pesky learning rates.

T Schaul, S Zhang, Y LeCun - ICML (3), 2013 - jmlr.org
Abstract The performance of stochastic gradient descent (SGD) depends critically on how
learning rates are tuned and decreased over time. We propose a method to automatically
adjust multiple learning rates so as to minimize the expected error at any one time. The ...

Deep unsupervised learning using nonequilibrium thermodynamics

J Sohl-Dickstein, EA Weiss… - arXiv preprint arXiv: …, 2015 - arxiv.org
Abstract: A central problem in machine learning involves modeling complex data-sets using
highly flexible families of probability distributions in which learning, sampling, inference, and
evaluation are still analytically or computationally tractable. Here, we develop an ...

[CITATION][C] Müller, Klaus-Robert. Efficient backprop

YA LeCun, L Bottou, GB Orr - Neural networks: Tricks of the trade