论文信息 - Divergence, Optimization and Geometry

Divergence, Optimization and Geometry

Measures of divergence are used in many engineering problems such as statistics, mathematical programming, computational vision, and neural networks. The Kullback-Leibler divergence is its typical example which is defined between two probability distributions, and is invariant under information transformations. The Bregman divergence is another type of divergence, which are used often in optimization and signal processing. This is a class of divergences having dually flat geometrical structure. Divergence is often used for minimizing discrepancy between observed evidences and an underlying model. Projection to the model subspace plays a fundamental role. Here, geometry is important and dually flat geodesic structure is useful, because a generalized Pythagorean theorem and projection theorem hold.

Shun-ichi Amari | S. Amari

[1] Dacheng Tao,et al. Bregman Divergence-Based Regularization for Transfer Subspace Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[3] Shun-ichi Amari,et al. Information Geometry and Its Applications: Convex Function and Dually Flat Manifold , 2009, ETVC.

[4] M. Grasselli. DUALITY, MONOTONICITY AND THE WIGNER YANASE DYSON METRICS , 2002, math-ph/0212022.

[5] A. Rényi. On Measures of Entropy and Information , 1961 .

[6] Imre Csiszár,et al. Axiomatic Characterizations of Information Measures , 2008, Entropy.

[7] Inderjit S. Dhillon,et al. Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[8] Shun-ichi Amari,et al. $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[9] S. Amari. Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[10] D. Petz. Monotone metrics on matrix spaces , 1996 .

[11] Andrzej Cichocki,et al. Nonnegative Matrix and Tensor Factorization T , 2007 .

[12] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[13] K. Drouiche. Testing proportionality for autoregressive processes , 2003, IEEE Trans. Inf. Theory.

[14] Yasuo Matsuyama,et al. The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures , 2003, IEEE Trans. Inf. Theory.

[15] Frank Nielsen. Emerging Trends in Visual Computing, LIX Fall Colloquium, ETVC 2008, Palaiseau, France, November 18-20, 2008. Revised Invited Papers , 2009, etvc.

[16] Xuelong Li,et al. Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[18] Shun-ichi Amari,et al. Methods of information geometry , 2000 .

[19] I. Csiszár. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[20] H. Hasegawa. α-Divergence of the non-commutative information geometry , 1993 .

[21] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[22] Inder Jeet Taneja,et al. Relative information of type s, Csiszár's f-divergence, and information inequalities , 2004, Inf. Sci..

[23] Jan Havrda,et al. Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.