论文信息 - DIVERGENCE FUNCTION , INFORMATION MONOTONICITY AND INFORMATION GEOMETRY

DIVERGENCE FUNCTION , INFORMATION MONOTONICITY AND INFORMATION GEOMETRY

A divergence function measures how different two points are in a base space. Well-known examples are the Kullback-Leibler divergence and f-divergence, which are defined in a manifold of probability distributions. The Bregman divergence is used in a more general situation. The present paper characterizes the geometrical structure which a divergence function gives, and proves that the fdivergences are unique in the sense of information-invariancy, giving the alpha-geometrical structure. Bregman divergences are characterized by dually flat geometrical structure. The paper also studies geometrical properties of hierarchical models which include singular structure.

Shun-ichi Amari | S. Amari

[1] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[2] Shun-ichi Amari,et al. Dynamics of Learning in Multilayer Perceptrons Near Singularities , 2008, IEEE Transactions on Neural Networks.

[3] S. Amari. Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[4] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5] N. Čencov. Statistical Decision Rules and Optimal Inference , 2000 .

[6] Shun-ichi Amari,et al. $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[7] Inderjit S. Dhillon,et al. Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[8] Shun-ichi Amari,et al. Dynamics of Learning Near Singularities in Layered Networks , 2008, Neural Computation.

[9] Shun-ichi Amari,et al. Information Geometry and Its Applications: Convex Function and Dually Flat Manifold , 2009, ETVC.

[10] D. Petz. Monotone metrics on matrix spaces , 1996 .

[11] S. Amari,et al. Singularities Affect Dynamics of Learning in Neuromanifolds , 2006, Neural Computation.

[12] Imre Csiszár,et al. Axiomatic Characterizations of Information Measures , 2008, Entropy.

[13] Jan Havrda,et al. Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.

[14] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[15] Shun-ichi Amari,et al. Dynamics of learning near singularities in radial basis function networks , 2008, Neural Networks.

[16] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[17] Inder Jeet Taneja,et al. Relative information of type s, Csiszár's f-divergence, and information inequalities , 2004, Inf. Sci..

[18] Kenji Fukumizu,et al. Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.

[19] Yasuo Matsuyama,et al. The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures , 2003, IEEE Trans. Inf. Theory.

[20] J. Milnor. On the concept of attractor , 1985 .

[21] A. Rényi. On Measures of Entropy and Information , 1961 .

[22] Shun-ichi Amari,et al. Methods of information geometry , 2000 .

[23] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[24] I. Csiszár. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .