DIFFERENTIAL GEOMETRY DERIVED FROM DIVERGENCE FUNCTIONS : INFORMATION GEOMETRY APPROACH

We study differential-geometrical structure of an information manifold equipped with a divergence function. A divergence function generates a Riemannian metric and furthermore it provides a symmetric third-order tensor, when the divergence is asymmetric. This induces a pair of affine connections dually coupled to each other with respect to the Riemannian metric. This is the arising emerged from information geometry. When a manifold is dually flat (it may be curved in the sense of the Levi-Civita connection), we have a canonical divergence and a pair of convex functions from which the original dual geometry is reconstructed. The generalized Pythagorean theorem and projection theorem hold in such a manifold. This structure has lots of applications in information sciences including statistics, machine learning, optimization, computer vision and Tsallis statistical mechanics. The present article reviews the structure of information geometry and its relation to the divergence function. We further consider the conformal structure given rise to by the generalized statistical model in relation to the power law.

[1]  D. Picard Statistical morphisms and related invariance properties , 1992 .

[2]  S. Amari,et al.  Information geometry of divergence functions , 2010 .

[3]  Shun-ichi Amari,et al.  Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries , 2012 .

[4]  Frank Nielsen,et al.  Total Bregman divergence and its applications to shape retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorizations : An algorithmic perspective , 2014, IEEE Signal Processing Magazine.

[6]  Takashi Kurose ON THE DIVERGENCES OF 1-CONFORMALLY FLAT STATISTICAL MANIFOLDS , 1994 .

[7]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[8]  Yasuo Matsuyama,et al.  The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures , 2003, IEEE Trans. Inf. Theory.

[9]  Yutaka Sakai,et al.  Synchronous Firing and Higher-Order Interactions in Neuron Pool , 2003, Neural Computation.

[10]  S. Amari,et al.  Information geometry of estimating functions in semi-parametric statistical models , 1997 .

[11]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[12]  Shun-ichi Amari,et al.  Information geometry of turbo and low-density parity-check codes , 2004, IEEE Transactions on Information Theory.

[13]  M. Rao,et al.  Metrics defined by Bregman divergences: Part 2 , 2008 .

[14]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[15]  Giovanni Pistone,et al.  Exponential statistical manifold , 2007 .

[16]  Takafumi Kanamori,et al.  Robust Boosting Algorithm Against Mislabeling in Multiclass Problems , 2008, Neural Computation.

[17]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[18]  C. Tsallis Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World , 2009 .

[19]  Shun-ichi Amari,et al.  Geometry of q-Exponential Family of Probability Distributions , 2011, Entropy.

[20]  Frank Nielsen,et al.  Total Bregman Divergence and Its Applications to DTI Analysis , 2011, IEEE Transactions on Medical Imaging.

[21]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[22]  S. Eguchi Second Order Efficiency of Minimum Contrast Estimators in a Curved Exponential Family , 1983 .

[23]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[24]  J. Naudts Generalized thermostatistics and mean-field theory , 2002, cond-mat/0211444.

[25]  P. Vos,et al.  Geometry of f-divergence , 1991 .

[26]  Shun-ichi Amari,et al.  $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[27]  AmariShun-Ichi α-divergence is unique, belonging to both f-divergence and Bregman divergence classes , 2009 .

[28]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[29]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[30]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .