Meta Clustering

Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for optimal clusterings based on a pre-specified clustering criterion. Our approach differs. We search for many alternate clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Meta clustering first finds a variety of clusterings and then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings.

[1]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[2]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[3]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[6]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[9]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[10]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.

[11]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[12]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[15]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[16]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[17]  Allan Tucker,et al.  Comparing, Contrasting and Combining Clusters in Viral Gene Expression , 2001 .

[18]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[21]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[22]  Qian Weining,et al.  Analyzing Popular Clustering Algorithms from Different Viewpoints , 2002 .

[23]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[24]  Steven Skiena,et al.  Integrating microarray data by consensus clustering , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[25]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[26]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[27]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[28]  Mari Ostendorf,et al.  Combining Multiple Clustering Systems , 2004, PKDD.

[29]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[30]  G. B. Mufti,et al.  Determining the number of groups from measures of cluster stability , 2005 .

[31]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[32]  Sergio M. Savaresi,et al.  Choosing the cluster to split in bisecting divisive clustering algorithms , 2006 .

[33]  Zoubin Ghahramani,et al.  A new approach to data driven clustering , 2006, ICML.