The Sparse MinMax k-Means Algorithm for High-Dimensional Clustering

Classical clustering methods usually face tough challenges when we have a larger set of features compared to the number of items to be partitioned. We propose a Sparse MinMax k-Means Clustering approach by reformulating the objective of the MinMax k-Means algorithm (a variation of classical kMeans that minimizes the maximum intra-cluster variance instead of the sum of intra-cluster variances), into a new weighted between-cluster sum of squares (BCSS) form. We impose sparse regularization on these weights to make it suitable for high-dimensional clustering. We seek to use the advantages of the MinMax k-Means algorithm in the high-dimensional space to generate good quality clusters. The efficacy of the proposal is showcased through comparison against a few representative clustering methods over several real world datasets.

[1]  Hans-Peter Kriegel,et al.  Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering , 2008, Proc. VLDB Endow..

[2]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[3]  Jesús S. Aguilar-Ruiz,et al.  Biclustering on expression data: A review , 2015, J. Biomed. Informatics.

[4]  K. Roeder,et al.  Journal of the American Statistical Association: Comment , 2006 .

[5]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[6]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[7]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[8]  Fei Yan,et al.  Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data , 2019, IEEE Access.

[9]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[10]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[11]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[12]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[13]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[14]  Matthew Stewart,et al.  IEEE Transactions on Cybernetics , 2015, IEEE Transactions on Cybernetics.

[15]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.