Hierarchical Ensemble Clustering

Ensemble clustering has emerged as an important elaboration of the classical clustering problems. Ensemble clustering refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Many approaches have been developed to solve ensemble clustering problems over the last few years. However, most of these ensemble techniques are designed for partitional clustering methods. Few research efforts have been reported for ensemble hierarchical clustering methods. In this paper, we propose a hierarchical ensemble clustering framework which can naturally combine both partitional clustering and hierarchical clustering results. We notice the importance of ultra-metric distance for hierarchical clustering and propose a novel method for learning the ultra-metric distance from the aggregated distance matrices and generating final hierarchical clustering with enhanced cluster separation. Experimental results demonstrate the effectiveness of our proposed approaches.

[1]  D. Swofford When are phylogeny estimates from molecular and morphological data incongruent , 1991 .

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008, Statistical analysis and data mining.

[4]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[5]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Hui Xiong,et al.  Transitive closure and metric inequality of weighted graphs: detecting protein interaction modules using cliques , 2006, Int. J. Data Min. Bioinform..

[7]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[8]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[9]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[10]  F. Rohlf,et al.  Tests for Hierarchical Structure in Random Data Sets , 1968 .

[11]  E. N. Adams,et al.  N-trees as nestings: Complexity, similarity, and consensus , 1986 .

[12]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[13]  Majid Ahmadi,et al.  A new method for hierarchical clustering combination , 2008, Intell. Data Anal..

[14]  E. N. Adams Consensus Techniques and the Comparison of Taxonomic Trees , 1972 .

[15]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[16]  Nir Ailon,et al.  Fitting tree metrics: Hierarchical clustering and phylogeny , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[17]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[18]  Bernard De Baets,et al.  Algorithms for computing the min-transitive closure and associated partition tree of a symmetric fuzzy relation , 2004, Eur. J. Oper. Res..

[19]  M. Wilkinson Common Cladistic Information and its Consensus Representation: Reduced Adams and Reduced Cladistic Consensus Trees and Profiles , 1994 .

[20]  János Podani Simulation of Random Dendrograms and Comparison Tests: Some Comments , 2000, J. Classif..

[21]  Mikkel Thorup,et al.  On the Agreement of Many Trees , 1995, Inf. Process. Lett..

[22]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.