Evolutionary Multiobjective Clustering

A new approach to data clustering is proposed, in which two or more measures of cluster quality are simultaneously optimized using a multiobjective evolutionary algorithm (EA). For this purpose, the PESA-II EA is adapted for the clustering problem by the incorporation of specialized mutation and initialization procedures, described herein. Two conceptually orthogonal measures of cluster quality are selected for optimization, enabling, for the first time, a clustering algorithm to explore and improve different compromise solutions during the clustering process. Our results, on a diverse suite of 15 real and synthetic data sets – where the correct classes are known – demonstrate a clear advantage to the multiobjective approach: solutions in the discovered Pareto set are objectively better than those obtained when the same EA is applied to optimize just one measure. Moreover, the multiobjective EA exhibits a far more robust level of performance than both the classic k-means and average-link agglomerative clustering algorithms, outperforming them substantially on aggregate.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[3]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[6]  Nicholas J. Radcliffe,et al.  Equivalence Class Analysis of Genetic Algorithms , 1991, Complex Syst..

[7]  Martin J. Oates,et al.  The Pareto Envelope-Based Selection Algorithm for Multi-objective Optimisation , 2000, PPSN.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[10]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[11]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[12]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[13]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Rowena Cole,et al.  Clustering with genetic algorithms , 1998 .

[15]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[16]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[17]  Pedro Larrañaga,et al.  Applying genetic algorithms to search for the best hierarchical clustering of a dataset , 1999, Pattern Recognit. Lett..

[18]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[19]  Bernhard Korte,et al.  Optimization and Operations Research , 1976 .

[20]  Wolfgang von der Gablentz,et al.  Robust Clustering by Evolutionary Computation , 2000 .

[21]  P. Brucker On the Complexity of Clustering Problems , 1978 .

[22]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .