Weighted Consensus Clustering

Consensus clustering has emerged as an important extension of the classical clustering problem. We propose weighted consensus clustering, where each input clustering is weighted and the weights are determined in such a way that the final consensus clustering provides a better quality solution, in which clusters are better separated comparing to standard consensus clustering. Theoretically, we show that a reformulation of the wellknown L1 regularization LASSO problem is equivalent to the weight optimization of our weighted consensus clustering, and thus our approach provides sparse solutions which may resolve the difficult situation when the input clusterings diverge significantly. We also show that the weighted consensus clustering resolves the redundancy problem when many input clusterings correlate highly. Detailed algorithms are given. Experiments are carried out to demonstrate the effectiveness of the weighted consensus clustering.

[1]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[2]  Shenghuo Zhu,et al.  Integrating Features from Different Sources for Music Information Retrieval , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[5]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[6]  Vipin Kumar,et al.  WebACE: a Web agent for document categorization and exploration , 1998, AGENTS '98.

[7]  Marcílio Carlos Pereira de Souto,et al.  Cluster Ensemble for Gene Expression Microarray Data: Accuracy and Diversity , 2006, IJCNN.

[8]  Daniel P. W. Ellis,et al.  The Quest for Ground Truth in Musical Artist Similarity , 2002, ISMIR.

[9]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Xiaohua Hu,et al.  Wavelet transformation and cluster ensemble for gene expression analysis , 2005, Int. J. Bioinform. Res. Appl..

[11]  Hongyuan Zha,et al.  Low-Rank Approximations with Sparse Factors II: Penalized Methods with Discrete Newton-Like Iterations , 2004, SIAM J. Matrix Anal. Appl..

[12]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[15]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[16]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[17]  Tao Li,et al.  On combining multiple clusterings , 2004, CIKM '04.

[18]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  Sharon L. Oviatt,et al.  Multimodal Integration - A Statistical View , 1999, IEEE Trans. Multim..

[20]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[21]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[22]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[23]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[24]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[25]  Chris H. Q. Ding,et al.  Cluster Structure of K-means Clustering via Principal Component Analysis , 2004, PAKDD.

[26]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..