A Cluster Ensembles Framework

Ensemble methods create solutions to learning problems by constructing a set of individual (different) solutions, and subsequently suitably aggregating these, e.g., by weighted averaging of the predictions in regression, or by taking a weighted vote on the predictions in classification. Such methods, which include Bayesian model averaging, bagging and boosting, have already become very popular for supervised learning problems. For clustering, using ensembles can help to improve the quality and robustness of the results, to re-use existing "knowledge", and to deal with data-distributed situations where not all objects or features are simultaneously available for computations. Aggregation strategies can be based on the idea of minimizing "average" dissimilarity. If only the individual cluster memberships are used, this leads to an optimization problem which in general is computationally hard. For a specific similarity measure which in the crisp case uses overall discordance (modulo relabeling), the characterization of the optimal solution allows the construction of a greedy forward aggregation algorithm ("voting") which performs well on a number of clustering problems. Alternative aggregation strategies can be based on re-clustering the objects according to the rate of co-labeling, or by clustering the collection of memberships of all objects grouped according to the labels. We conclude with an outlook on possible further research on cluster ensembles.

[1]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[2]  Dave E. Eckhardt,et al.  A theoretical investigation of generalized voters for redundant systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[3]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[4]  P. Gader,et al.  Advances in fuzzy integration for pattern recognition , 1994, CVPR 1994.

[5]  B. Parhami Voting algorithms , 1994 .

[6]  Ching Y. Suen,et al.  Optimal combinations of pattern classifiers , 1995, Pattern Recognition Letters.

[7]  Kwang Bo Cho,et al.  Radial basis function based adaptive fuzzy systems , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[8]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[9]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[10]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[12]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[13]  Ludmila I. Kuncheva,et al.  Clustering-and-selection model for classifier combination , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[14]  Louisa Lam,et al.  Classifier Combinations: Implementations and Theoretical Issues , 2000, Multiple Classifier Systems.

[15]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[16]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[17]  Joydeep Ghosh,et al.  Relationship-based clustering and cluster ensembles for high-dimensional data mining , 2002 .

[18]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[19]  Kurt Hornik,et al.  A Combination Scheme for Fuzzy Clustering , 2002, AFSS.

[20]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.