论文信息 - Statistical learning by natural gradient descent

Statistical learning by natural gradient descent

Based on stochastic perceptron models and statistical inference, we train single-layer and two-layer perceptrons by natural gradient descent. We have discovered an efficient scheme to present the Fisher information matrix of a stochastic two-layer perceptron. Based on this scheme, we have designed an algorithm to compute the natural gradient. When the input dimension n is much larger than the number of hidden neurons, the complexity of this algorithm is of order O (n). It is confirmed by simulations that the natural gradient descent learning rule is not only efficient but also robust.

H. H. Yang | S. Amari

[1] William J. Stewart,et al. Introduction to the numerical solution of Markov Chains , 1994 .

[2] G. Stewart. Introduction to matrix computations , 1973 .

[3] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[4] Xin Yao,et al. Ensemble learning via negative correlation , 1999, Neural Networks.

[5] Saad,et al. On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[6] Jean-François Cardoso,et al. Equivariant adaptive source separation , 1996, IEEE Trans. Signal Process..

[7] Shun-ichi Amari,et al. Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons , 1998, Neural Computation.

[8] M. Kendall,et al. Kendall's advanced theory of statistics , 1995 .

[9] S. Amari,et al. Training Multi-Layer Perceptrons by Natural Gradient Descent , 1997, ICONIP.

[10] Shun-ichi Amari,et al. Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[11] John E. Moody,et al. Towards Faster Stochastic Gradient Search , 1991, NIPS.