Fitness Landscape Analysis in Data-Driven Optimization: An Investigation of Clustering Problems

Data-driven optimization problems such as clustering provide a real-world representative source of instances for benchmarking continuous metaheuristic optimization algorithms. Although many metaheurstics and other algorithms have been applied to clustering problems, relatively little research has attempted to explore the structure of the problem space and/or fitness landscape for these problems. In contrast, problem-specific analysis and insights for several classes of combinatorial problems have been developed. This paper investigates the structure of the fitness landscapes of clustering problems, focusing on the fundamental parameters that define problem instances (the dimensionality, number and distribution of the data points and the number of clusters). The paper also provides a general method for active, targeted generation of problem instances based on real-world datasets. The results provide a number of new insights into this family of continuous optimization problems as well as methods and guidelines intended to facilitate better experimental evaluation and comparison of continuous optimization algorithms.

[1]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[2]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[3]  Kate Smith-Miles,et al.  Measuring instance difficulty for combinatorial optimization problems , 2012, Comput. Oper. Res..

[4]  Kate Smith-Miles,et al.  Predicting Metaheuristic Performance on Graph Coloring Problems Using Data Mining , 2013, Hybrid Metaheuristics.

[5]  Peter A. N. Bosman,et al.  The importance of implementation details and parameter settings in black-box optimization: a case study on Gaussian estimation-of-distribution algorithms and circles-in-a-square packing problems , 2018, Soft Comput..

[6]  Adil M. Bagirov,et al.  Modified global k-means algorithm for minimum sum-of-squares clustering problems , 2008, Pattern Recognit..

[7]  Bernd Bischl,et al.  Exploratory landscape analysis , 2011, GECCO '11.

[8]  Riccardo Poli,et al.  Evolving Problems to Learn About Particle Swarm Optimizers and Other Search Algorithms , 2006, IEEE Transactions on Evolutionary Computation.

[9]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[10]  Pierre Hansen,et al.  Improvement and Comparison of Heuristics for Solving the Uncapacitated Multisource Weber Problem , 2000, Oper. Res..

[11]  Leszek Gasieniec,et al.  Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms , 2007, SODA 2007.

[12]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Bernd Bischl,et al.  A novel feature-based approach to characterize algorithm performance for the traveling salesperson problem , 2012, Annals of Mathematics and Artificial Intelligence.

[14]  Saïd Salhi,et al.  A Genetic Algorithm Based Approach for the Uncapacitated Continuous Location–Allocation Problem , 2003, Ann. Oper. Res..

[15]  Toby Walsh,et al.  The TSP Phase Transition , 1996, Artif. Intell..

[16]  R. Tibshirani,et al.  Model Search by Bootstrap “Bumping” , 1999 .

[17]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[18]  M. Gallagher,et al.  A Model-based Framework for Black-box Problem Comparison Using Gaussian Processes , 2018, Evolutionary Computation.

[19]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[20]  Marcus Gallagher,et al.  Fitness Landscape Analysis of Circles in a Square Packing Problems , 2014, SEAL.

[21]  Jano I. van Hemert,et al.  Evolving Combinatorial Problem Instances That Are Difficult to Solve , 2006, Evolutionary Computation.

[22]  Marcus Gallagher,et al.  Direct Feature Evaluation in Black-Box Optimization Using Problem Transformations , 2019, Evolutionary Computation.

[23]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[24]  Assaf Naor,et al.  Rigorous location of phase transitions in hard optimization problems , 2005, Nature.

[25]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[26]  E. D. Weinberger,et al.  The NK model of rugged fitness landscapes and its application to maturation of the immune response. , 1989, Journal of theoretical biology.

[27]  Marcus Gallagher,et al.  Towards improved benchmarking of black-box optimization algorithms using clustering problems , 2016, Soft Comput..

[28]  Kate Smith-Miles,et al.  Generating new test instances by evolving in instance space , 2015, Comput. Oper. Res..