论文信息 - The Sparse MinMax k-Means Algorithm for High-Dimensional Clustering

The Sparse MinMax k-Means Algorithm for High-Dimensional Clustering

Classical clustering methods usually face tough challenges when we have a larger set of features compared to the number of items to be partitioned. We propose a Sparse MinMax k-Means Clustering approach by reformulating the objective of the MinMax k-Means algorithm (a variation of classical kMeans that minimizes the maximum intra-cluster variance instead of the sum of intra-cluster variances), into a new weighted between-cluster sum of squares (BCSS) form. We impose sparse regularization on these weights to make it suitable for high-dimensional clustering. We seek to use the advantages of the MinMax k-Means algorithm in the high-dimensional space to generate good quality clusters. The efficacy of the proposal is showcased through comparison against a few representative clustering methods over several real world datasets.

[1] Hans-Peter Kriegel,et al. Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering , 2008, Proc. VLDB Endow..

[2] Robert Tibshirani,et al. A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[3] Jesús S. Aguilar-Ruiz,et al. Biclustering on expression data: A review , 2015, J. Biomed. Informatics.

[4] K. Roeder,et al. Journal of the American Statistical Association: Comment , 2006 .

[5] B. Ripley,et al. Pattern Recognition , 1968, Nature.

[6] Patricio A. Vela,et al. A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[7] J. Dunn. Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[8] Fei Yan,et al. Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data , 2019, IEEE Access.

[9] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[10] Pedro Larrañaga,et al. An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[11] David J. Kriegman,et al. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[12] Hans-Peter Kriegel,et al. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[13] David J. Kriegman,et al. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[14] Matthew Stewart,et al. IEEE Transactions on Cybernetics , 2015, IEEE Transactions on Cybernetics.

[15] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.