论文信息 - On Clustering Histograms with k-Means by Using Mixed α-Divergences

On Clustering Histograms with k-Means by Using Mixed α-Divergences

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the α -divergences using the concept of mixed divergences. First, we present a novel extension of k-means clustering to mixed divergences. Second, we extend the k-means++ seeding to mixed α-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed α-divergences.

[1] S. Amari. Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[2] Moni Naor,et al. Theory and Applications of Models of Computation , 2015, Lecture Notes in Computer Science.

[3] Cordelia Schmid,et al. Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[4] Dominik Olszewski,et al. Asymmetric clustering using the alpha-beta divergence , 2014, Pattern Recognit..

[5] Frank Nielsen,et al. Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[6] Shun-ichi Amari,et al. $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[7] Frank Nielsen,et al. The Dual Voronoi Diagrams with Respect to Representational Bregman Divergences , 2009, 2009 Sixth International Symposium on Voronoi Diagrams.

[8] Huaiyu Zhu,et al. Measurements of generalisation based on information geometry , 1997 .

[9] Andrew McCallum,et al. Distributional clustering of words for text classification , 1998, SIGIR '98.

[10] James M. Rehg,et al. Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11] Erhard Heinz,et al. Beiträge zur Störungstheorie der Spektralzerleung , 1951 .

[12] Shun-ichi Amari,et al. Methods of information geometry , 2000 .

[13] Yasuo Matsuyama,et al. The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures , 2003, IEEE Trans. Inf. Theory.

[14] Marc Teboulle,et al. A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[15] D. A. Barry,et al. Real values of the W-function , 1995, TOMS.

[16] Frank Nielsen,et al. A family of statistical symmetric divergences based on Jensen's inequality , 2010, ArXiv.

[17] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[18] Rainer Lienhart,et al. Bundle min-hashing for logo recognition , 2013, ICMR '13.

[19] Nir Ailon,et al. A Tight Lower Bound Instance for k-means++ in Constant Dimension , 2014, TAMC.

[20] Frank Nielsen,et al. An Information-Geometric Characterization of Chernoff Information , 2013, IEEE Signal Processing Letters.

[21] Frank Nielsen,et al. Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[22] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[23] Brigitte Bigi,et al. Using Kullback-Leibler Distance for Text Categorization , 2003, ECIR.

[24] T. Morimoto. Markov Processes and the H -Theorem , 1963 .

[25] Frank Nielsen,et al. Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms , 2013, IEEE Signal Processing Letters.

[26] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[27] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[28] Nuno Vasconcelos,et al. Endoscopic image analysis in semantic space , 2012, Medical Image Anal..

[29] Frank Nielsen. On the symmetrical Kullback-Leibler Jeffreys centroids , 2013, ArXiv.

[30] Richard Nock,et al. Mixed Bregman Clustering with Approximation Guarantees , 2008, ECML/PKDD.

[31] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[32] Oscar C. Au,et al. Bag of textons for image segmentation via soft clustering and convex shift , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Frank Nielsen,et al. The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[34] Bernd Girod,et al. Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[35] Frank Nielsen,et al. Non-linear book manifolds: learning from associations the dynamic geometry of digital libraries , 2013, JCDL '13.

[36] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[37] R. Veldhuis. The centroid of the symmetrical Kullback-Leibler distance , 2002, IEEE Signal Processing Letters.

[38] Sergio Cruces,et al. Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[39] Ádám Besenyei. On the invariance equation for Heinz means , 2012 .