What kind of graphical model is the brain?

If neurons are treated as latent variables, our visual systems are non-linear, densely-connected graphical models containing billions of variables and thousands of billions of parameters. Current algorithms would have difficulty learning a graphical model of this scale. Starting with an algorithm that has difficulty learning more than a few thousand parameters, I describe a series of progressively better learning algorithms all of which are designed to run on neuron-like hardware. The latest member of this series can learn deep, multi-layer belief nets quite rapidly. It turns a generic network with three hidden layers and 1:7 million connections into a very good generative model of handwritten digits. After learning, the model gives classification performance that is comparable to the best discriminative methods.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[2]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[3]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[4]  C. A. McMahan,et al.  The vegetation types of Texas-including cropland. , 1984 .

[5]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[6]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  W. B. Davis,et al.  The Mammals of Texas , 1960 .

[8]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[9]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[10]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[11]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[12]  Geoffrey E. Hinton,et al.  Recognizing Hand-written Digits Using Hierarchical Products of Experts , 2002, NIPS.

[13]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[14]  Geoffrey E. Hinton,et al.  Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.

[15]  Yee Whye Teh,et al.  Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[16]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[17]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[18]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[19]  Harry C. Oberholser,et al.  The bird life of Texas , 1974 .

[20]  Geoffrey E. Hinton,et al.  Learning Population Codes by Minimizing Description Length , 1993, Neural Computation.

[21]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[22]  Yee Whye Teh,et al.  Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..