Information geometry of neural network—an overview

The set of all the neural networks of a fixed architecture forms a geometrical manifold where the modifable connection weights play the role of coordinates. It is important to study all such networks as a whole rather than the behavior of each network in order to understand the capability of information processing of neural networks. What is the natural geometry to be introduced in the manifold of neural networks? Information geometry gives an answer, giving the Riemannian metric and a dual pair of affine connections. An overview is given to information geometry of neural networks.

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[3]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  D. Cox,et al.  The role of differential geometry in statistical theory , 1986 .

[6]  R. Kass [The Geometry of Asymptotic Inference]: Rejoinder , 1989 .

[7]  Shun-ichi Amari,et al.  Statistical inference under multiterminal rate restrictions: A differential geometric approach , 1989, IEEE Trans. Inf. Theory.

[8]  R. Kass The Geometry of Asymptotic Inference , 1989 .

[9]  S. Amari Fisher information under restriction of Shannon information in multi-terminal situations , 1989 .

[10]  Shun-ichi Amari,et al.  Dualistic geometry of the manifold of higher-order neurons , 1991, Neural Networks.

[11]  Shun-ichi Amari,et al.  Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.

[12]  M. Murray,et al.  Differential Geometry and Statistics , 1993 .

[13]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[14]  Shun-ichi Amari,et al.  Differential geometric structures of stable state feedback systems with dual connections , 1992, Kybernetika.

[15]  S. Amari,et al.  Gradient systems in view of information geometry , 1995 .

[16]  Shun-ichi Amari,et al.  Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[17]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.