Multiobjective clustering around medoids

The large majority of existing clustering algorithms are centered around the notion of a feature, that is, individual data items are represented by their intrinsic properties, which are summarized by (usually numeric) feature vectors. However, certain applications require the clustering of data items that are defined by exclusively extrinsic properties: only the relationships between individual data items are known (that is, their similarities or dissimilarities). This paper develops a straightforward and efficient adaptation of our existing multiobjective clustering algorithm to such a scenario. The resulting algorithm is demonstrated on a range of data sets, including a dissimilarity matrix derived from real, non-feature-based data

[1]  Joshua D. Knowles,et al.  Evolutionary Multiobjective Clustering , 2004, PPSN.

[2]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[3]  Joshua D. Knowles,et al.  Improvements to the scalability of multiobjective clustering , 2005, 2005 IEEE Congress on Evolutionary Computation.

[4]  L. Hubert Approximate Evaluation Techniques for the Single-Link and Complete-Link Hierarchical Clustering Procedures , 1974 .

[5]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[8]  L. Hubert,et al.  An Empirical Comparison of Baseline Models for Goodness-of-Fit in r-Diameter Hierarchical Clustering , 1977 .

[9]  Joshua D. Knowles,et al.  Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering , 2005, EMO.

[10]  R. F. Ling A Probability Theory of Cluster Analysis , 1973 .

[11]  Warren S. Sarle,et al.  Cubic Clustering Criterion , 1983 .

[12]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  Natalio Krasnogor,et al.  Measuring the similarity of protein structures by means of the universal similarity metric , 2004, Bioinform..

[15]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.