Cluster Structure of K-means Clustering via Principal Component Analysis

K-means clustering is a popular data clustering algorithm. Principal component analysis (PCA) is a widely used statistical technique for dimension reduction. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering, with a clear simplex cluster structure. Our results prove that PCA-based dimension reductions are particularly effective for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix.

[1]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  I. Jolliffe Principal Component Analysis , 2002 .

[4]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.