A Bayesian Unsupervised Learning Algorithm that Scales

A persistent worry with computational models of unsupervised learning is that learning will become more difficult as the problem is scaled. We examine this issue in the context of a novel hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model performs perceptual inference in a probabilistically consistent manner by using top-down, bottom-up and lateral connections. These connections can be learned using simple rules that require only locally available information. We first demonstrate that the model can extract a sparse, distributed, hierarchical representation of depth from simplified random-dot stereograms. We then investigate some of the scaling properties of the algorithm on this problem and find that: (1) Increasing the image size leads to faster and more reliable learning; (2) Increasing the depth of the network from one to two hidden layers leads to better representations at the first hidden layer, and (3) Once one part of the network has discovered how to represent depth, it “supervises” other parts of the network, greatly speeding up their learning.

[1]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[2]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[3]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[4]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[6]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[7]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[8]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[9]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1987, Computer.

[10]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[11]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[12]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[13]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[14]  Terrence J. Sejnowski,et al.  Bayesian Unsupervised Learning of Higher Order Structure , 1996, NIPS.

[15]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[16]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[18]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.