Dynamical analysis of contrastive divergence learning: Restricted Boltzmann machines with Gaussian visible units

The restricted Boltzmann machine (RBM) is an essential constituent of deep learning, but it is hard to train by using maximum likelihood (ML) learning, which minimizes the Kullback-Leibler (KL) divergence. Instead, contrastive divergence (CD) learning has been developed as an approximation of ML learning and widely used in practice. To clarify the performance of CD learning, in this paper, we analytically derive the fixed points where ML and CDn learning rules converge in two types of RBMs: one with Gaussian visible and Gaussian hidden units and the other with Gaussian visible and Bernoulli hidden units. In addition, we analyze the stability of the fixed points. As a result, we find that the stable points of CDn learning rule coincide with those of ML learning rule in a Gaussian-Gaussian RBM. We also reveal that larger principal components of the input data are extracted at the stable points. Moreover, in a Gaussian-Bernoulli RBM, we find that both ML and CDn learning can extract independent components at one of stable points. Our analysis demonstrates that the same feature components as those extracted by ML learning are extracted simply by performing CD1 learning. Expanding this study should elucidate the specific solutions obtained by CD learning in other types of RBMs or in deep networks.

[1]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[2]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[3]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[4]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[5]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[6]  Andrzej Cichocki,et al.  Stability Analysis of Learning Algorithms for Blind Source Separation , 1997, Neural Networks.

[7]  Shun-ichi Amari,et al.  Unified stabilization approach to principal and minor components extraction algorithms , 2001, Neural Networks.

[8]  John B. Moore,et al.  Global analysis of Oja's flow for neural networks , 1994, IEEE Trans. Neural Networks.

[9]  Wei-Yong Yan,et al.  Global convergence of Oja's subspace algorithm for principal component extraction , 1998, IEEE Trans. Neural Networks.

[10]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[11]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[12]  Christopher K. I. Williams,et al.  An analysis of contrastive divergence learning in gaussian boltzmann machines , 2002 .

[13]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[14]  Shotaro Akaho,et al.  Information Geometry of Contrastive Divergence , 2008, ITSL.

[15]  Erkki Oja,et al.  The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[16]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[17]  E. Oja,et al.  Independent Component Analysis , 2013 .

[18]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[19]  Laurenz Wiskott,et al.  Gaussian-binary restricted Boltzmann machines for modeling natural image statistics , 2014, PloS one.

[20]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[21]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[22]  Ilya Sutskever,et al.  On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.

[23]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[24]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[25]  S.-I. Amari,et al.  Neural theory of association and concept-formation , 1977, Biological Cybernetics.

[26]  Alan L. Yuille,et al.  The Convergence of Contrastive Divergences , 2004, NIPS.