Dynamics of Learning in Multilayer Perceptrons Near Singularities

The dynamical behavior of learning is known to be very slow for the multilayer perceptron, being often trapped in the "plateau." It has been recently understood that this is due to the singularity in the parameter space of perceptrons, in which trajectories of learning are drawn. The space is Riemannian from the point of view of information geometry and contains singular regions where the Riemannian metric or the Fisher information matrix degenerates. This paper analyzes the dynamics of learning in a neighborhood of the singular regions when the true teacher machine lies at the singularity. We give explicit asymptotic analytical solutions (trajectories) both for the standard gradient (SGD) and natural gradient (NGD) methods. It is clearly shown, in the case of the SGD method, that the plateau phenomenon appears in a neighborhood of the critical regions, where the dynamical behavior is extremely slow. The analysis of the NGD method is much more difficult, because the inverse of the Fisher information matrix diverges. We conquer the difficulty by introducing the "blow-down" technique used in algebraic geometry. The NGD method works efficiently, and the state converges directly to the true parameters very quickly while it staggers in the case of the SGD method. The analytical results are compared with computer simulations, showing good agreement. The effects of singularities on learning are thus qualitatively clarified for both standard and NGD methods.

[1]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[2]  Sumio Watanabe,et al.  Algebraic Analysis for Nonidentifiable Learning Machines , 2001, Neural Computation.

[3]  M. Rattray,et al.  Analysis of natural gradient descent for multilayer neural networks , 1999, cond-mat/9901212.

[4]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[5]  Masato Okada,et al.  On-Line Learning Dynamics of Multilayer Perceptrons with Unidentifiable Parameters , 2003 .

[6]  Hilbert J. Kappen,et al.  Nonmonotonic Generalization Bias of Gaussian Mixture Models , 2000, Neural Computation.

[7]  Kenji Fukumizu,et al.  Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.

[8]  Saad,et al.  On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[9]  S. Amari,et al.  Singularities Affect Dynamics of Learning in Neuromanifolds , 2006, Neural Computation.

[10]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[11]  Sumio Watanabe,et al.  Singularities in mixture models and upper bounds of stochastic complexity , 2003, Neural Networks.

[12]  Shun-ichi Amari,et al.  Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.

[13]  Kenji Fukumizu,et al.  Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.

[14]  Magnus Rattray,et al.  Natural gradient descent for on-line learning , 1998 .

[15]  K. Fukumizu Likelihood ratio of unidentifiable models and multilayer neural networks , 2003 .

[16]  S. Amari,et al.  Differential and Algebraic Geometry of Multilayer Perceptrons , 2001 .

[17]  Heskes,et al.  Learning processes in neural networks. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[18]  Katsuyuki Hagiwara On the Problem in Model Selection of Neural Network Regression in Overrealizable Scenario , 2002, Neural Computation.

[19]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[20]  S. Geman SOME AVERAGING AND STABILITY RESULTS FOR RANDOM DIFFERENTIAL EQUATIONS , 1979 .

[21]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[22]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[23]  T. Willmore Algebraic Geometry , 1973, Nature.

[24]  H. Hironaka Resolution of Singularities of an Algebraic Variety Over a Field of Characteristic Zero: II , 1964 .