Linearized cluster assignment via spectral ordering

Spectral clustering uses eigenvectors of the Laplacian of the similarity matrix. They are most conveniently applied to 2-way clustering problems. When applying to multi-way clustering, either the 2-way spectral clustering is recursively applied or an embedding to spectral space is done and some other methods are used to cluster the points. Here we propose and study a K-way cluster assignment method. The method transforms the problem to find valleys and peaks of a 1-D quantity called cluster crossing, which measures the symmetric cluster overlap across a cut point along a linear ordering of the data points. The method can either determine K clusters in one shot or recursively split a current cluster into several smaller ones. We show that a linear ordering based on a distance sensitive objective has a continuous solution which is the eigenvector of the Laplacian, showing the close relationship between clustering and ordering. The method relies on the connectivity matrix constructed as the truncated spectral expansion of the similarity matrix, useful for revealing cluster structure. The method is applied to newsgroups to illustrate introduced concepts; experiments show it outperforms the recursive 2-way clustering and the standard K-means clustering.

[1]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[3]  H. D. Simon,et al.  A spectral algorithm for envelope reduction of sparse matrices , 1993, Supercomputing '93. Proceedings.

[4]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Ian Witten,et al.  Data Mining , 2000 .

[6]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[7]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[8]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[9]  Chris H. Q. Ding,et al.  Unsupervised Learning: Self-aggregation in Scaled Principal Component Space , 2002, PKDD.

[10]  Chris H. Q. Ding,et al.  Cluster merging and splitting in hierarchical clustering algorithms , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[12]  Lihao Xu,et al.  Multiway cuts and spec-tral clustering , 2003 .

[13]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.