Variational Learning in Nonlinear Gaussian Belief Networks

We view perceptual tasks such as vision and speech recognition as inference problems where the goal is to estimate the posterior distribution over latent variables (e.g., depth in stereo vision) given the sensory input. The recent flurry of research in independent component analysis exemplifies the importance of inferring the continuous-valued latent variables of input data. The latent variables found by this method are linearly related to the input, but perception requires nonlinear inferences such as classification and depth estimation. In this article, we present a unifying framework for stochastic neural networks with nonlinear latent variables. Nonlinear units are obtained by passing the outputs of linear gaussian units through various nonlinearities. We present a general variational method that maximizes a lower bound on the likelihood of a training set and give results on two visual feature extraction problems. We also show how the variational method can be used for pattern classification and compare the performance of these nonlinear networks with other methods on the problem of handwritten digit recognition.

[1]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[2]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[3]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[4]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  Pierre Comon,et al.  Blind separation of sources, part II: Problems statement , 1991, Signal Process..

[8]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[9]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[10]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[12]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[13]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[14]  R. Zemel,et al.  Learning sparse multiple cause models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[15]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[16]  Brendan J. Frey,et al.  Does the Wake-sleep Algorithm Produce Good Density Estimators? , 1995, NIPS.

[17]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[18]  C. J.,et al.  Maximum Likelihood and Covariant Algorithms for Independent Component Analysis , 1996 .

[19]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[20]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[21]  Brendan J. Frey,et al.  Continuous Sigmoidal Belief Networks Trained using Slice Sampling , 1996, NIPS.

[22]  Geoffrey E. Hinton,et al.  The delve manual , 1996 .

[23]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[24]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[25]  Brendan J. Frey,et al.  Bayesian networks for pattern classification, data compression, and channel coding , 1997 .

[26]  Terrence J. Sejnowski,et al.  Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[27]  Brian Sallans,et al.  A Hierarchical Community of Experts , 1999, Learning in Graphical Models.

[28]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[29]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[30]  Ali Mansour,et al.  Blind Separation of Sources , 1999 .