On Conformal Divergences and Their Population Minimizers

Total Bregman divergences are a recent tweak of ordinary Bregman divergences originally motivated by applications that required invariance by rotations. They have displayed superior results compared with ordinary Bregman divergences on several clustering, computer vision, medical imaging, and machine learning tasks. These preliminary results raise two important problems. First, report a complete characterization of the left and right population minimizers for this class of total Bregman divergences. Second, characterize a principled superset of total and ordinary Bregman divergences with good clustering properties, from which one could tailor the choice of a divergence to a particular application. In this paper, we provide and study one such superset with interesting geometric features, that we call conformal divergences, and focus on their left and right population minimizers. Our results are obtained in a recently coined (u, v) -geometric structure that is a generalization of the dually flat affine connections in information geometry. We characterize both analytically and geometrically the population minimizers. We prove that conformal divergences (resp. total Bregman divergences) are essentially exhaustive for their left (resp. right) population minimizers. We further report new results and extend previous results on the robustness to outliers of the left and right population minimizers, and discuss the role of the (u, v) -geometric structure in clustering. Additional results are also given.

[1]  A. B. Sossinsky,et al.  Tolerance space theory and some applications , 1986 .

[2]  Xiaojing Ye,et al.  Coarse-to-fine classification via parametric and nonparametric models for computer-aided diagnosis , 2011, CIKM '11.

[3]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[4]  Baba C. Vemuri,et al.  Robust and efficient regularized boosting using total Bregman divergence , 2011, CVPR 2011.

[5]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[6]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Edwin R. Hancock,et al.  Information-Theoretic Dissimilarities for Graphs , 2013, SIMBAD.

[9]  Rachid Deriche,et al.  A robust variational approach for simultaneous smoothing and estimation of DTI , 2013, NeuroImage.

[10]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[12]  Igor Vajda,et al.  On Bregman Distances and Divergences of Probability Measures , 2012, IEEE Transactions on Information Theory.

[13]  Shun-ichi Amari,et al.  Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries , 2012 .

[14]  Baba C. Vemuri,et al.  Total bregman divergence, a robust divergence measure, and its applications , 2011 .

[15]  Frank Nielsen,et al.  Shape Retrieval Using Hierarchical Total Bregman Soft Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Frank Nielsen,et al.  Total Bregman Divergence and Its Applications to DTI Analysis , 2011, IEEE Transactions on Medical Imaging.

[17]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[18]  Shun-ichi Amari Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure , 2014, Entropy.

[19]  Edwin R. Hancock,et al.  Tensor-based total bregman divergences between graphs , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Michèle Basseville,et al.  Divergence measures for statistical data processing - An annotated bibliography , 2013, Signal Process..

[21]  Isabelle Guyon,et al.  Clustering: Science or Art? , 2009, ICML Unsupervised and Transfer Learning.

[22]  Frank Nielsen,et al.  Total Bregman divergence and its applications to shape retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Jacob D. Abernethy,et al.  A Characterization of Scoring Rules for Linear Properties , 2012, COLT.

[24]  Lionel Lacassagne,et al.  TOTAL BREGMAN DIVERGENCE FOR MULTIPLE OBJECT TRACKING , 2013 .

[25]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[26]  Shun-ichi Amari,et al.  A dually flat structure on the space of escort distributions , 2010 .

[27]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[28]  Michèle Basseville,et al.  Divergence measures for statistical data processing , 2010 .

[29]  Daniel Boley,et al.  Bregman Divergences and Triangle Inequality , 2013, SDM.