K-means clustering via principal component analysis

Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. Several implications are discussed. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation that PCA, via singular value decomposition, provides the best low-dimensional linear approximation of the data. On learning, the result suggests effective techniques for K-means data clustering. DNA gene expression and Internet newsgroups are analyzed to illustrate our results. Experiments indicate that the new bounds are within 0.5-1.5% of the optimal values.

[1]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[2]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[3]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  David G. Stork,et al.  Pattern Classification , 1973 .

[6]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[7]  A. D. Gordon,et al.  An Algorithm for Euclidean Sum of Squares Classification , 1977 .

[8]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[9]  J. W. Humberston Classical mechanics , 1980, Nature.

[10]  J. Caruso,et al.  An approximation to ΩⁿΣⁿ , 1981 .

[11]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  Takeo Kanade,et al.  Finding natural clusters through entropy minimization , 1989 .

[14]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[15]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[16]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[17]  Andrew W. Moore,et al.  Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees , 1998, NIPS.

[18]  Francesc J. Ferri,et al.  Initializing normal mixtures of densities , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[19]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[20]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[21]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[22]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[23]  Rong Zhang,et al.  A large scale clustering scheme for kernel K-Means , 2002, Object recognition supported by user interaction for service robots.

[24]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[25]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[26]  Chris H. Q. Ding,et al.  Linearized cluster assignment via spectral ordering , 2004, ICML.