A Strongly Consistent Sparse k-means Clustering with Direct l1 Penalization on Variable Weights

We propose the Lasso Weighted $k$-means ($LW$-$k$-means) algorithm as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ($p$) can be much larger compared to the number of observations ($n$). In the $LW$-$k$-means algorithm, we introduce a lasso-based penalty term, directly on the feature weights to incorporate feature selection in the framework of sparse clustering. $LW$-$k$-means does not make any distributional assumption of the given dataset and thus, induces a non-parametric method for feature selection. We also analytically investigate the convergence of the underlying optimization procedure in $LW$-$k$-means and establish the strong consistency of our algorithm. $LW$-$k$-means is tested on several real-life and synthetic datasets and through detailed experimental analysis, we find that the performance of the method is highly competitive against some state-of-the-art procedures for clustering and feature selection, not only in terms of clustering accuracy but also with respect to computational time.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[3]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Edward R. Dougherty,et al.  Reporting bias when using real data sets to analyze classification performance , 2010, Bioinform..

[5]  Yunming Ye,et al.  Weighting Method for Feature Selection in K-Means , 2007 .

[6]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[7]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[8]  Jian Yu,et al.  A Novel Fuzzy C-Means Clustering Algorithm , 2006, RSKT.

[9]  Yoshikazu Terada Strong consistency of factorial $$K$$K-means clustering , 2013 .

[10]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Chieh-Yuan Tsai,et al.  Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm , 2008, Comput. Stat. Data Anal..

[12]  Saptarshi Chakraborty,et al.  On the strong consistency of feature‐weighted k‐means clustering in a nearmetric space , 2019, Stat.

[13]  H. Kiers,et al.  Factorial k-means analysis for two-way data , 2001 .

[14]  Yunming Ye,et al.  A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[15]  Wei Pan,et al.  Penalized Model-Based Clustering with Application to Variable Selection , 2007, J. Mach. Learn. Res..

[16]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[17]  Ery Arias-Castro,et al.  Detection and Feature Selection in Sparse Mixture Models , 2014, 1405.1478.

[18]  Thomas Jech,et al.  About the Axiom of Choice , 1973 .

[19]  Jiashun Jin,et al.  Phase Transitions for High Dimensional Clustering and Related Problems , 2015, 1502.06952.

[20]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[21]  Yoshikazu Terada,et al.  Strong Consistency of Reduced K‐means Clustering , 2012, 1212.4942.

[22]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[23]  Gunter Ritter,et al.  Strong consistency of k-parameters clustering , 2013, J. Multivar. Anal..

[24]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[25]  Geoffrey J. McLachlan,et al.  On the number of components in a Gaussian mixture model , 2014, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[26]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[27]  Wei Sun,et al.  Regularized k-means clustering of high-dimensional data and its asymptotic consistency , 2012 .

[28]  Renato Cordeiro de Amorim,et al.  A Survey on Feature Weighting Based K-Means Algorithms , 2015, Journal of Classification.

[29]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[30]  Ka-Chun Wong,et al.  A Short Survey on Data Clustering Algorithms , 2015, 2015 Second International Conference on Soft Computing and Machine Intelligence (ISCMI).

[31]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[32]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[33]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[34]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[35]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[36]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[37]  Jiashun Jin,et al.  Rejoinder: “Influential features PCA for high dimensional clustering” , 2016 .

[38]  Vladimir Nikulin,et al.  Strong consistency of the prototype based clustering in probabilistic space , 2010, J. Mach. Learn. Res..

[39]  W. Rudin Real and complex analysis , 1968 .

[40]  D. Donoho,et al.  Higher criticism thresholding: Optimal feature selection when useful features are rare and weak , 2008, Proceedings of the National Academy of Sciences.

[41]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[42]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[43]  Jennifer G. Dy Unsupervised Feature Selection , 2007 .

[44]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[45]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[46]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[47]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[48]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[49]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[50]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[51]  Ery Arias-Castro,et al.  A simple approach to sparse clustering , 2017, Comput. Stat. Data Anal..

[52]  Larry A. Wasserman,et al.  Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation , 2013, NIPS.

[53]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[54]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[55]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[56]  J. Carroll,et al.  K-means clustering in a low-dimensional Euclidean space , 1994 .

[57]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .