A New Evaluation Function for Clustering: The NK Internal Validation Criterion

The use of good evaluation functions is essential when evolutionary algorithms are employed for clustering. The NK internal clustering validation measure is proposed for hard partitional clustering. The evaluation function is composed of N subfunctions, where N is the number of objects in the dataset. Each subfunction is influenced by a group of K+1 objects. By using neighbourhood relations among connected small groups, density-based regions can be identified. The NK internal clustering validation measure allows the application of partition crossover (PX). PX for hard partitional clustering is also proposed in this work. By using PX, the evaluation function can be decomposed in q partial evaluations. As a consequence, PX deterministically finds the best of 2q possible offspring at the cost of evaluating 2 solutions. In the experiments, the application of PX resulted in a high number of successful recombinations. It was able to improve partitions defined by the best parents.

[1]  Nelson F. F. Ebecken,et al.  A genetic algorithm for cluster analysis , 2003, Intell. Data Anal..

[2]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[3]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[4]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[6]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[8]  Stuart A. Kauffman,et al.  ORIGINS OF ORDER IN EVOLUTION: SELF-ORGANIZATION AND SELECTION , 1992 .

[9]  M. Cugmas,et al.  On comparing partitions , 2015 .

[10]  Michal Daszykowski,et al.  Revised DBSCAN algorithm to cluster data with dense adjacent clusters , 2013 .

[11]  L. Darrell Whitley,et al.  Improving an exact solver for the traveling salesman problem using partition crossover , 2017, GECCO.

[12]  T Watson Layne,et al.  A Genetic Algorithm Approach to Cluster Analysis , 1998 .

[13]  Doug Hains,et al.  Revisiting the big valley search space structure in the TSP , 2011, J. Oper. Res. Soc..

[14]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[15]  Pablo A. Jaskowiak On the evaluation of clustering results: measures, ensembles, and gene expression data analysis , 2015 .

[16]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[17]  L. Darrell Whitley,et al.  Partition Crossover for Pseudo-Boolean Optimization , 2015, FOGA.

[18]  L. Darrell Whitley Mk Landscapes, NK Landscapes, MAX-kSAT: A Proof that the Only Challenging Problems are Deceptive , 2015, GECCO.

[19]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..