Dynamics of Learning in Multilayer Perceptrons

The dynamical behavior of learning is known to be very slow for the multilayer perceptron, being often trapped in the "plateau." It has been recently understood that this is due to the singularity in the parameter space of perceptrons, in which trajec- tories of learning are drawn. The space is Riemannian from the point of view of information geometry and contains singular re- gions where the Riemannian metric or the Fisher information ma- trix degenerates. This paper analyzes the dynamics of learning in a neighborhood of the singular regions when the true teacher ma- chine lies at the singularity. We give explicit asymptotic analytical solutions (trajectories) both for the standard gradient (SGD) and natural gradient (NGD) methods. It is clearly shown, in the case of the SGD method, that the plateau phenomenon appears in a neigh- borhood of the critical regions, where the dynamical behavior is ex- tremely slow. The analysis of the NGD method is much more diffi- cult, because the inverse of the Fisher information matrix diverges. We conquer the difficulty by introducing the "blow-down" tech- nique used in algebraic geometry. The NGD method works effi- ciently, and the state converges directly to the true parameters very quickly while it staggers in the case of the SGD method. The an- alytical results are compared with computer simulations, showing good agreement. The effects of singularities on learning are thus qualitatively clarified for both standard and NGD methods. analysis has not yet been given. This paper studies the dynam- ical behaviors of learning analytically in a neighborhood of sin- gularities in the cases of both the standard gradient (SGD) and NGD methods. This clarifies the slow convergence of the SGD method and the quick dynamical behavior of the NGD method

[1]  H. Hironaka Resolution of Singularities of an Algebraic Variety Over a Field of Characteristic Zero: II , 1964 .

[2]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[3]  S. Geman SOME AVERAGING AND STABILITY RESULTS FOR RANDOM DIFFERENTIAL EQUATIONS , 1979 .

[4]  Heskes,et al.  Learning processes in neural networks. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[5]  Shun-ichi Amari,et al.  Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.

[6]  Saad,et al.  On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[7]  Magnus Rattray,et al.  Natural gradient descent for on-line learning , 1998 .

[8]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[9]  M. Rattray,et al.  Analysis of natural gradient descent for multilayer neural networks , 1999, cond-mat/9901212.

[10]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[11]  Kenji Fukumizu,et al.  Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.

[12]  Kenji Fukumizu,et al.  Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.

[13]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[14]  Hilbert J. Kappen,et al.  Nonmonotonic Generalization Bias of Gaussian Mixture Models , 2000, Neural Computation.

[15]  S. Amari,et al.  Differential and Algebraic Geometry of Multilayer Perceptrons , 2001 .

[16]  Sumio Watanabe,et al.  Algebraic Analysis for Nonidentifiable Learning Machines , 2001, Neural Computation.

[17]  Katsuyuki Hagiwara On the Problem in Model Selection of Neural Network Regression in Overrealizable Scenario , 2002, Neural Computation.

[18]  Sumio Watanabe,et al.  Singularities in mixture models and upper bounds of stochastic complexity , 2003, Neural Networks.

[19]  Masato Okada,et al.  On-Line Learning Dynamics of Multilayer Perceptrons with Unidentifiable Parameters , 2003 .

[20]  K. Fukumizu Likelihood ratio of unidentifiable models and multilayer neural networks , 2003 .

[21]  Shun-ichi Amari,et al.  Singularities Affect Dynamics of Learning in Neuromanifolds , 2006, Neural Comput..