Consensus Clusterings

In this paper we address the problem of combining multiple clusterings without access to the underlying features of the data. This process is known in the literature as clustering ensembles, clustering aggregation, or consensus clustering. Consensus clustering yields a stable and robust final clustering that is in agreement with multiple clusterings. We find that an iterative EM-like method is remarkably effective for this problem. We present an iterative algorithm and its variations for finding clustering consensus. An extensive empirical study compares our proposed algorithms with eleven other consensus clustering methods on four data sets using three different clustering performance metrics. The experimental results show that the new ensemble clustering methods produce clusterings that are as good as, and often better than, these other methods.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Rich Caruana,et al.  Meta Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[4]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: application in VLSI domain , 1997, DAC.

[7]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[8]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[9]  Marina Meila,et al.  A Comparison of Spectral Clustering Algorithms , 2003 .

[10]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[11]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[14]  Mari Ostendorf,et al.  Combining Multiple Clustering Systems , 2004, PKDD.

[15]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[16]  S. Dongen Performance criteria for graph clustering and Markov cluster experiments , 2000 .

[17]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[18]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[19]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[20]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[21]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..