On Clustering Histograms with k-Means by Using Mixed α-Divergences

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the α -divergences using the concept of mixed divergences. First, we present a novel extension of k-means clustering to mixed divergences. Second, we extend the k-means++ seeding to mixed α-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed α-divergences.

[1]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[2]  Moni Naor,et al.  Theory and Applications of Models of Computation , 2015, Lecture Notes in Computer Science.

[3]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[4]  Dominik Olszewski,et al.  Asymmetric clustering using the alpha-beta divergence , 2014, Pattern Recognit..

[5]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[6]  Shun-ichi Amari,et al.  $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[7]  Frank Nielsen,et al.  The Dual Voronoi Diagrams with Respect to Representational Bregman Divergences , 2009, 2009 Sixth International Symposium on Voronoi Diagrams.

[8]  Huaiyu Zhu,et al.  Measurements of generalisation based on information geometry , 1997 .

[9]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[10]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Erhard Heinz,et al.  Beiträge zur Störungstheorie der Spektralzerleung , 1951 .

[12]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[13]  Yasuo Matsuyama,et al.  The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures , 2003, IEEE Trans. Inf. Theory.

[14]  Marc Teboulle,et al.  A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[15]  D. A. Barry,et al.  Real values of the W-function , 1995, TOMS.

[16]  Frank Nielsen,et al.  A family of statistical symmetric divergences based on Jensen's inequality , 2010, ArXiv.

[17]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[18]  Rainer Lienhart,et al.  Bundle min-hashing for logo recognition , 2013, ICMR '13.

[19]  Nir Ailon,et al.  A Tight Lower Bound Instance for k-means++ in Constant Dimension , 2014, TAMC.

[20]  Frank Nielsen,et al.  An Information-Geometric Characterization of Chernoff Information , 2013, IEEE Signal Processing Letters.

[21]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[22]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[23]  Brigitte Bigi,et al.  Using Kullback-Leibler Distance for Text Categorization , 2003, ECIR.

[24]  T. Morimoto Markov Processes and the H -Theorem , 1963 .

[25]  Frank Nielsen,et al.  Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms , 2013, IEEE Signal Processing Letters.

[26]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[27]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[28]  Nuno Vasconcelos,et al.  Endoscopic image analysis in semantic space , 2012, Medical Image Anal..

[29]  Frank Nielsen On the symmetrical Kullback-Leibler Jeffreys centroids , 2013, ArXiv.

[30]  Richard Nock,et al.  Mixed Bregman Clustering with Approximation Guarantees , 2008, ECML/PKDD.

[31]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[32]  Oscar C. Au,et al.  Bag of textons for image segmentation via soft clustering and convex shift , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Frank Nielsen,et al.  The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[34]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[35]  Frank Nielsen,et al.  Non-linear book manifolds: learning from associations the dynamic geometry of digital libraries , 2013, JCDL '13.

[36]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[37]  R. Veldhuis The centroid of the symmetrical Kullback-Leibler distance , 2002, IEEE Signal Processing Letters.

[38]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[39]  Ádám Besenyei On the invariance equation for Heinz means , 2012 .