Information geometry on hierarchy of probability distributions

An exponential family or mixture family of probability distributions has a natural hierarchical structure. This paper gives an "orthogonal" decomposition of such a system based on information geometry. A typical example is the decomposition of stochastic dependency among a number of random variables. In general, they have a complex structure of dependencies. Pairwise dependency is easily represented by correlation, but it is more difficult to measure effects of pure triplewise or higher order interactions (dependencies) among these variables. Stochastic dependency is decomposed quantitatively into an "orthogonal" sum of pairwise, triplewise, and further higher order dependencies. This gives a new invariant decomposition of joint entropy. This problem is important for extracting intrinsic interactions in firing patterns of an ensemble of neurons and for estimating its functional connections. The orthogonal decomposition is given in a wide class of hierarchical structures including both exponential and mixture families. As an example, we decompose the dependency in a higher order Markov chain into a sum of those in various lower order Markov chains.

[1]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[2]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[3]  Te Sun Han Nonnegative Entropy Measures of Multivariate Symmetric Correlations , 1978, Inf. Control..

[4]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[5]  T. Pitcher Review: O. Barndorff-Nielsen, Information and exponential families in statistical theory , 1979 .

[6]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[7]  L. L. Campbell,et al.  The relation between information theory and the differential geometry approach to statistics , 1985, Inf. Sci..

[8]  Byoung-Seon Choi,et al.  Conditional limit theorems under Markov conditioning , 1987, IEEE Trans. Inf. Theory.

[9]  Shun-ichi Amari,et al.  Statistical inference under multiterminal rate restrictions: A differential geometric approach , 1989, IEEE Trans. Inf. Theory.

[10]  S. Amari Fisher information under restriction of Shannon information in multi-terminal situations , 1989 .

[11]  M K Habib,et al.  Dynamics of neuronal firing correlation: modulation of "effective connectivity". , 1989, Journal of neurophysiology.

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Shun-ichi Amari,et al.  Dualistic geometry of the manifold of higher-order neurons , 1991, Neural Networks.

[14]  Shun-ichi Amari,et al.  Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.

[15]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[16]  M. Murray,et al.  Differential Geometry and Statistics , 1993 .

[17]  Giovanni Pistone,et al.  An Infinite-Dimensional Geometric Structure on the Space of all the Probability Measures Equivalent to a Given One , 1995 .

[18]  Shun-ichi Amari,et al.  Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[19]  S. Amari,et al.  Dualistic differential geometry of positive definite matrices and its applications to related problems , 1996 .

[20]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference , 1997 .

[21]  A. Ohara Information Geometric Analysis of a Interior-Point Method for Semidefinite Programming , 1997 .

[22]  S. Amari,et al.  Information geometry of estimating functions in semi-parametric statistical models , 1997 .

[23]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference: Kass/Geometrical , 1997 .

[24]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[25]  Shun-ichi Amari,et al.  Statistical Inference Under Multiterminal Data Compression , 1998, IEEE Trans. Inf. Theory.

[26]  Giovanni Pistone,et al.  The Exponential Statistical Manifold: Mean Parameters, Orthogonality and Space Transformations , 1999 .

[27]  Toshiyuki Tanaka,et al.  Information Geometry of Mean-Field Approximation , 2000, Neural Computation.

[28]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[29]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[30]  Hiroyuki Ito,et al.  Model Dependence in Quantification of Spike Interdependence by Joint Peri-Stimulus Time Histogram , 2000, Neural Computation.