An overview of clustering methods

Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different fields are actively working on the clustering problem. This paper provides an overview of the different representative clustering methods. In addition, several clustering validations indices are shown. Furthermore, approaches to automatically determine the number of clusters are presented. Finally, application of different heuristic approaches to the clustering problem is also investigated.

[1]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[2]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[3]  K. Huang,et al.  A synergistic automatic clustering technique (SYNERACT) for multispectral image Analysis , 2002 .

[4]  Jun Zhang,et al.  Cluster validation for unsupervised stochastic model-based image segmentation , 1994, Proceedings of 1st International Conference on Image Processing.

[5]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR '79.

[6]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[7]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[8]  John F. Roddick,et al.  A clustering algorithm using the tabu search approach with simulated annealing for vector quantization , 2003 .

[9]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[11]  Julius T. Tou,et al.  Dynoc—A dynamic optimal cluster-seeking technique , 1979, International Journal of Computer & Information Sciences.

[12]  Andries Petrus Engelbrecht,et al.  Particle swarm optimization method for image clustering , 2005, Int. J. Pattern Recognit. Artif. Intell..

[13]  Andries P. Engelbrecht,et al.  Image Classification using Particle Swarm Optimization , 2002, SEAL.

[14]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[15]  Erik K. Antonsson,et al.  Dynamic partitional clustering using evolution strategies , 2000, 2000 26th Annual Conference of the IEEE Industrial Electronics Society. IECON 2000. 2000 IEEE International Conference on Industrial Electronics, Control and Instrumentation. 21st Century Technologies.

[16]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[17]  Hichem Frigui,et al.  Clustering by competitive agglomeration , 1997, Pattern Recognit..

[18]  Richard C. Dubes,et al.  Experiments in projection and clustering by simulated annealing , 1989, Pattern Recognit..

[19]  Josiane Zerubia,et al.  Fully unsupervised fuzzy clustering with entropy criterion , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[20]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[21]  Greg Hamerly,et al.  Alternatives to the k-means algorithm that find better clusterings , 2002, CIKM '02.

[22]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Alan H. Fielding,et al.  Cluster and Classification Techniques for the Biosciences , 2006 .

[24]  Claudio Carpineto,et al.  A lattice conceptual clustering system and its application to browsing retrieval , 2004, Machine Learning.

[25]  Alan Wee-Chung Liew,et al.  Fuzzy image clustering incorporating spatial continuity , 2000 .

[26]  Andries Petrus Engelbrecht,et al.  Differential evolution methods for unsupervised image classification , 2005, 2005 IEEE Congress on Evolutionary Computation.

[27]  Dr. Zbigniew Michalewicz,et al.  How to Solve It: Modern Heuristics , 2004 .

[28]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[29]  Leandro N. de Castro,et al.  Data Clustering with Particle Swarms , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[30]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[31]  Neri Merhav,et al.  The estimation of the model order in exponential families , 1989, IEEE Trans. Inf. Theory.

[32]  King-Sun Fu,et al.  A Sentence-to-Sentence Clustering Procedure for Pattern Analysis , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[33]  Andries Petrus Engelbrecht,et al.  Data clustering using particle swarm optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[34]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[35]  G.B. Coleman,et al.  Image segmentation by clustering , 1979, Proceedings of the IEEE.

[36]  Umeshwar Dayal,et al.  K-Harmonic Means - A Data Clustering Algorithm , 1999 .

[37]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[38]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[39]  M. Vazirgiannis,et al.  Clustering validity assessment using multi representatives , 2002 .

[40]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[42]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[43]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[44]  R. Storn,et al.  Differential evolution a simple and efficient adaptive scheme for global optimization over continu , 1997 .

[45]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR 1979.

[46]  Mahamed G. H. Omran Particle swarm optimization methods for pattern recognition and image processing , 2006 .

[47]  Shi Zhongzhi,et al.  A clustering algorithm based on swarm intelligence , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[48]  M. Delgado,et al.  A tabu search approach to the fuzzy clustering problem , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[49]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[50]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[51]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[52]  Christophe Rosenberger,et al.  Unsupervised clustering method with optimal estimation of the number of clusters: application to image segmentation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[53]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[54]  Gregory James Hamerly,et al.  Learning structure and concepts in data through data clustering , 2003 .

[55]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[56]  Brian Everitt,et al.  Cluster analysis , 1974 .

[57]  Baldo Faieta,et al.  Diversity and adaptation in populations of clustering ants , 1994 .

[58]  Cor J. Veenman,et al.  A cellular coevolutionary algorithm for image segmentation , 2003, IEEE Trans. Image Process..

[59]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[60]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[63]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[64]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[65]  Olli Nevalainen,et al.  Tabu search algorithm for codebook generation in vector quantization , 1998, Pattern Recognit..

[66]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[67]  Zhigang Xiang,et al.  Color image quantization by minimizing the maximum intercluster distance , 1997, TOGS.

[68]  M. Narasimha Murty,et al.  A near-optimal initial seed value selection in K-means means algorithm using a genetic algorithm , 1993, Pattern Recognit. Lett..

[69]  Nicolas Monmarché,et al.  A new clustering algorithm based on the chemical recognition system of ants , 2002 .

[70]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[71]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[73]  Nozha Boujemaa On competitive unsupervised clustering , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[74]  Sandra Paterlini,et al.  High performance clustering with differential evolution , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[75]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Abhijit S. Pandya,et al.  Pattern Recognition with Neural Networks in C++ , 1995 .

[77]  Kishan G. Mehrotra,et al.  Elements of artificial neural networks , 1996 .

[78]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[79]  Olli Nevalainen,et al.  A new iterative algorithm for VQ codebook generation , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[80]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[81]  Hazem M. Abbas,et al.  Neural networks for maximum likelihood clustering , 1994, Signal Process..

[82]  BischofHorst,et al.  MDL Principle for Robust Vector Quantisation , 1999 .

[83]  Thomas Bäck,et al.  A Survey of Evolution Strategies , 1991, ICGA.

[84]  Yeuvo Jphonen,et al.  Self-Organizing Maps , 1995 .

[85]  Koeng-Mo Sung,et al.  Fast clustering algorithm for vector quantisation , 1998 .

[86]  Bin Zhang,et al.  Genera lized K- Harmonic Means - - Boosting in Unsupervised Learnin g , 2000 .

[87]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[88]  Ferdinand van der Heijden,et al.  Recursive unsupervised learning of finite mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Palma Blonda,et al.  A survey of fuzzy clustering algorithms for pattern recognition. I , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[90]  Russell C. Eberhart,et al.  Gene clustering using self-organizing maps and particle swarm optimization , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[91]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[92]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[93]  H. Akaike A new look at the statistical model identification , 1974 .

[94]  Wei-Min Ma,et al.  A novelty Bayesian method for unsupervised learning of finite mixture models , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[95]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[96]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[97]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[98]  Jonathan J. Oliver Introduction to Minimum Encoding Inference , 1994 .

[99]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[100]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[101]  C. S. Wallace,et al.  Unsupervised Learning Using MML , 1996, ICML.

[102]  A. Engelbrecht,et al.  Self-Adaptive Differential Evolution Methods for Unsupervised Image Classification , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[103]  Bin Zhang Generalized K-Harmonic Means -- Boosting in Unsupervised Learning , 2000 .

[104]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[105]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[106]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[107]  Andries Petrus Engelbrecht,et al.  Self-adaptive Differential Evolution , 2005, CIS.

[108]  CHUShuchuan A Clustering Algorithm Using the Tabu Search Approach with Simulated Annealing for Vector Quantization , 2003 .

[109]  Anil K. Jain,et al.  Large-scale parallel data clustering , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[110]  Paul Scheunders,et al.  A comparison of clustering algorithms applied to color image quantization , 1997, Pattern Recognit. Lett..

[111]  David L. Dowe,et al.  Intrinsic classification by MML - the Snob program , 1994 .

[112]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[113]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[114]  E. R. Davies,et al.  Machine vision - theory, algorithms, practicalities , 2004 .