Divergence, Optimization and Geometry

Measures of divergence are used in many engineering problems such as statistics, mathematical programming, computational vision, and neural networks. The Kullback-Leibler divergence is its typical example which is defined between two probability distributions, and is invariant under information transformations. The Bregman divergence is another type of divergence, which are used often in optimization and signal processing. This is a class of divergences having dually flat geometrical structure. Divergence is often used for minimizing discrepancy between observed evidences and an underlying model. Projection to the model subspace plays a fundamental role. Here, geometry is important and dually flat geodesic structure is useful, because a generalized Pythagorean theorem and projection theorem hold.

[1]  Dacheng Tao,et al.  Bregman Divergence-Based Regularization for Transfer Subspace Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[3]  Shun-ichi Amari,et al.  Information Geometry and Its Applications: Convex Function and Dually Flat Manifold , 2009, ETVC.

[4]  M. Grasselli DUALITY, MONOTONICITY AND THE WIGNER YANASE DYSON METRICS , 2002, math-ph/0212022.

[5]  A. Rényi On Measures of Entropy and Information , 1961 .

[6]  Imre Csiszár,et al.  Axiomatic Characterizations of Information Measures , 2008, Entropy.

[7]  Inderjit S. Dhillon,et al.  Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[8]  Shun-ichi Amari,et al.  $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[9]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[10]  D. Petz Monotone metrics on matrix spaces , 1996 .

[11]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[12]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[13]  K. Drouiche Testing proportionality for autoregressive processes , 2003, IEEE Trans. Inf. Theory.

[14]  Yasuo Matsuyama,et al.  The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures , 2003, IEEE Trans. Inf. Theory.

[15]  Frank Nielsen Emerging Trends in Visual Computing, LIX Fall Colloquium, ETVC 2008, Palaiseau, France, November 18-20, 2008. Revised Invited Papers , 2009, etvc.

[16]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[18]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[19]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[20]  H. Hasegawa α-Divergence of the non-commutative information geometry , 1993 .

[21]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[22]  Inder Jeet Taneja,et al.  Relative information of type s, Csiszár's f-divergence, and information inequalities , 2004, Inf. Sci..

[23]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.