Evidence Accumulation in Multiobjective Data Clustering

Multiobjective approaches to data clustering return sets of solutions that correspond to trade-offs between different clustering objectives. Here, an established ensemble technique (evidence-accumulation) is applied to the identification of shared features within the set of clustering solutions returned by the multiobjective clustering method MOCK. We show that this approach can be employed to achieve a four-fold reduction in the number of candidate solutions, whilst maintaining the accuracy of MOCK’s best clustering solutions. We also find that the resulting knowledge provides a novel design basis for the visual exploration and comparison of different clustering solutions. There are clear parallels with recent work on ‘innovization’, where it was suggested that the design-space analysis of the solution sets returned by multiobjective optimization may provide deep insight into the core design principles of good solutions.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[3]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[4]  Kalyanmoy Deb,et al.  Automated discovery of vital knowledge from Pareto-optimal solutions: First results from engineering design , 2010, IEEE Congress on Evolutionary Computation.

[5]  Joshua D. Knowles,et al.  Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering , 2005, EMO.

[6]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[7]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[8]  Joachim M. Buhmann,et al.  A Resampling Approach to Cluster Validation , 2002, COMPSTAT.

[9]  Kalyanmoy Deb,et al.  Automated Innovization for Simultaneous Discovery of Multiple Rules in Bi-objective Problems , 2011, EMO.

[10]  Pierre Hansen,et al.  Bicriterion Cluster Analysis , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[13]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[14]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[15]  S. Bandyopadhyay,et al.  Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes , 2009, BMC Bioinformatics.

[16]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[17]  James C. Bezdek,et al.  Cluster validation with generalized Dunn's indices , 1995, Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems.