$\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes

A divergence measure between two probability distributions or positive arrays (positive measures) is a useful tool for solving optimization problems in optimization, signal processing, machine learning, and statistical inference. The Csiszar f-divergence is a unique class of divergences having information monotonicity, from which the dual alpha geometrical structure with the Fisher metric is derived. The Bregman divergence is another class of divergences that gives a dually flat geometrical structure different from the alpha-structure in general. Csiszar gave an axiomatic characterization of divergences related to inference problems. The Kullback-Leibler divergence is proved to belong to both classes, and this is the only such one in the space of probability distributions. This paper proves that the alpha-divergences constitute a unique class belonging to both classes when the space of positive measures or positive arrays is considered. They are the canonical divergences derived from the dually flat geometrical structure of the space of positive measures.

[1]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[2]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[3]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[4]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[5]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[6]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[7]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[8]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[9]  N. N. Chent︠s︡ov Statistical decision rules and optimal inference , 1982 .

[10]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[11]  Imre Csiszár,et al.  Axiomatic Characterizations of Information Measures , 2008, Entropy.

[12]  J. Copas,et al.  A class of logistic‐type discriminant functions , 2002 .

[13]  D. Petz Monotone metrics on matrix spaces , 1996 .

[14]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[15]  Frank Nielsen Emerging Trends in Visual Computing, LIX Fall Colloquium, ETVC 2008, Palaiseau, France, November 18-20, 2008. Revised Invited Papers , 2009, etvc.

[16]  Yasuo Matsuyama,et al.  The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures , 2003, IEEE Trans. Inf. Theory.

[17]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[18]  K. Drouiche Testing proportionality for autoregressive processes , 2003, IEEE Trans. Inf. Theory.

[19]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[20]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[21]  Shun-ichi Amari,et al.  Information Geometry and Its Applications: Convex Function and Dually Flat Manifold , 2009, ETVC.

[22]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[23]  Inder Jeet Taneja,et al.  Relative information of type s, Csiszár's f-divergence, and information inequalities , 2004, Inf. Sci..

[24]  S. Amari,et al.  Information geometry of divergence functions , 2010 .

[25]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.