Adaptive Natural Gradient Learning Algorithms for Unnormalized Statistical Models

The natural gradient is a powerful method to improve the transient dynamics of learning by utilizing the geometric structure of the parameter space. Many natural gradient methods have been developed for maximum likelihood learning, which is based on Kullback-Leibler (KL) divergence and its Fisher metric. However, they require the computation of the normalization constant and are not applicable to statistical models with an analytically intractable normalization constant. In this study, we extend the natural gradient framework to divergences for the unnormalized statistical models: score matching and ratio matching. In addition, we derive novel adaptive natural gradient algorithms that do not require computationally demanding inversion of the metric and show their effectiveness in some numerical experiments. In particular, experimental results in a multi-layer neural network model demonstrate that the proposed method can escape from the plateau phenomena much faster than the conventional stochastic gradient descent method.

[1]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[2]  Shun-ichi Amari,et al.  Information Geometry and Its Applications , 2016 .

[3]  Razvan Pascanu,et al.  Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines , 2013, ICLR.

[4]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[5]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[6]  Aapo Hyvärinen,et al.  A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010, Neural Computation.

[7]  Nicolas Le Roux,et al.  Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.

[8]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[9]  Ruslan Salakhutdinov,et al.  Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix , 2015, ICML.

[10]  S. Eguchi Second Order Efficiency of Minimum Contrast Estimators in a Curved Exponential Family , 1983 .

[11]  Kenji Fukumizu,et al.  Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.

[12]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[13]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[14]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.