k-Means clustering with a new divergence-based distance metric: Convergence and performance analysis

Abstract The choice of a proper similarity/dissimilarity measure is very important in cluster analysis for revealing the natural grouping in a given dataset. Choosing the most appropriate measure has been an open problem for many years in cluster analysis. Among various approaches of incorporating a non-Euclidean dissimilarity measure for clustering, use of the divergence-based distance functions has recently gained attention in the perspective of partitional clustering. Following this direction, we propose a new point-to-point distance measure called the S − distance motivated from the recently developed S-divergence measure (originally defined on the open cone of positive definite matrices) and discuss some of its important properties. We subsequently develop the S − k − means algorithm (with Lloyd’s heuristic) which replaces the conventional Euclidean distance of k − means with the S − distance. We also provide a theoretical analysis of the S − k − means algorithm establishing the convergence of the obtained partial optimal solutions to a locally optimal solution. The performance of S − k − means is compared with the classical k − means algorithm with Euclidean distance metric and its feature-weighted variants using several synthetic and real-life datasets. The comparative study indicates that our results are appealing, especially when the distribution of the clusters is not regular.

[1]  Frank Nielsen,et al.  On Clustering Histograms with k-Means by Using Mixed α-Divergences , 2014, Entropy.

[2]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[3]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[5]  Shinto Eguchi,et al.  Spontaneous Clustering via Minimum Gamma-Divergence , 2014, Neural Computation.

[6]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[7]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[8]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[9]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[10]  Marc Teboulle,et al.  A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[11]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[12]  Mithun Das Gupta,et al.  KL divergence based agglomerative clustering for automated Vitiligo grading , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Swagatam Das,et al.  Geometric divergence based fuzzy clustering with strong resilience to noise features , 2016, Pattern Recognit. Lett..

[14]  Frank Nielsen,et al.  On Conformal Divergences and Their Population Minimizers , 2013, IEEE Transactions on Information Theory.

[15]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[16]  Ka-Chun Wong,et al.  A Short Survey on Data Clustering Algorithms , 2015, 2015 Second International Conference on Soft Computing and Machine Intelligence (ISCMI).

[17]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[18]  Ulrike von Luxburg,et al.  Risk-Based Generalizations of f-divergences , 2011, ICML.

[19]  Marcel R. Ackermann,et al.  Clustering for metric and non-metric distance measures , 2008, SODA '08.

[20]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  S. Sra Positive definite matrices and the S-divergence , 2011, 1110.1773.

[22]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[23]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[24]  Frank Nielsen,et al.  Total Jensen divergences: Definition, properties and clustering , 2013, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[26]  W. Rudin Principles of mathematical analysis , 1964 .