Enhancing point symmetry-based distance for data clustering

In this paper, at first a new point symmetry-based similarity measurement is proposed which satisfies the closure and the symmetry properties of any distance function. The different desirable properties of the new distance are elaborately explained. Thereafter a new clustering algorithm based on the search capability of genetic algorithm is developed where the newly developed point symmetry-based distance is used for cluster assignment. The allocation of points to different clusters is performed in such a way that the closure property is satisfied. The proposed GA with newly developed point symmetry distance based (GAnPS) clustering algorithm is capable of determining different symmetrical shaped clusters having any sizes or convexities. The effectiveness of the proposed GAnPS clustering technique in identifying the proper partitioning is shown for twenty-one data sets having various characteristics. Performance of GAnPS is compared with existing symmetry-based genetic clustering technique, GAPS, three popular and well-known clustering techniques, K-means, expectation maximization and average linkage algorithm. In a part of the paper, the utility of the proposed clustering technique is shown for partitioning a remote sensing satellite image. The last part of the paper deals with the development of some automatic clustering techniques using the newly proposed symmetry-based distance.

[1]  Chien-Hsing Chou,et al.  Symmetry as A new Measure for Cluster Validity , 2002 .

[2]  Sanghamitra Bandyopadhyay,et al.  Application of a New Symmetry-Based Cluster Validity Index for Satellite Image Segmentation , 2008, IEEE Geoscience and Remote Sensing Letters.

[3]  Sanghamitra Bandyopadhyay,et al.  On principle axis based line symmetry clustering techniques , 2011, Memetic Comput..

[4]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[5]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Hagit Hel-Or,et al.  Symmetry as a Continuous Feature , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Kalyanmoy Deb,et al.  Understanding Interactions among Genetic Algorithm Parameters , 1998, FOGA.

[8]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[9]  Makoto Sakamoto,et al.  Effects of Population Size on Computational Performance of Genetic Algorithm on Multiplicative Landscape , 2007, Third International Conference on Natural Computation (ICNC 2007).

[10]  Kuo-Liang Chung,et al.  Faster and more robust point symmetry-based K-means algorithm , 2007, Pattern Recognit..

[11]  Bhabatosh Chanda,et al.  A Symmetry Based Clustering Technique for Multi-Spectral Satellite Imagery , 2002, ICVGIP.

[12]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[13]  J.T. Alander,et al.  On optimal population size of genetic algorithms , 1992, CompEuro 1992 Proceedings Computer Systems and Software Engineering.

[14]  Sanghamitra Bandyopadhyay,et al.  GAPS: A clustering method using a new point symmetry-based distance measure , 2007, Pattern Recognit..

[15]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[16]  Sriparna Saha,et al.  A generalized automatic clustering algorithm in a multiobjective framework , 2013, Appl. Soft Comput..

[17]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[18]  Ieee Machine,et al.  A New Line Symmetry Distance and Its Application to Data Clustering , 2009 .

[19]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[20]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[21]  Sanghamitra Bandyopadhyay,et al.  A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[24]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[25]  David E. Goldberg,et al.  The parameter-less genetic algorithm in practice , 2004, Inf. Sci..

[26]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[27]  Kalyanmoy Deb,et al.  Genetic Algorithms, Noise, and the Sizing of Populations , 1992, Complex Syst..

[28]  David E. Goldberg,et al.  Sizing Populations for Serial and Parallel Genetic Algorithms , 1989, ICGA.

[29]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[30]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[31]  Sanghamitra Bandyopadhyay,et al.  Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications , 2012 .

[32]  James C. Bezdek,et al.  Fuzzy mathematics in pattern classification , 1973 .

[33]  Sanghamitra Bandyopadhyay,et al.  A symmetry based multiobjective clustering technique for automatic evolution of clusters , 2010, Pattern Recognit..

[34]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[35]  Lalit M. Patnaik,et al.  Adaptive probabilities of crossover and mutation in genetic algorithms , 1994, IEEE Trans. Syst. Man Cybern..

[36]  C. Bong,et al.  Multiobjective clustering with metaheuristic: current trends and methods in image segmentation , 2012 .

[37]  John A. Richards,et al.  Remote Sensing Digital Image Analysis: An Introduction , 1999 .

[38]  Sanghamitra Bandyopadhyay,et al.  Simultaneous feature selection and symmetry based clustering using multiobjective framework , 2015, Appl. Soft Comput..

[39]  Ganapati Panda,et al.  A survey on nature inspired metaheuristic algorithms for partitional clustering , 2014, Swarm Evol. Comput..

[40]  Makoto Sakamoto,et al.  Influence of Finite Population Size - Extinction of Favorable Schemata , 2005, ICNC.

[41]  Sanghamitra Bandyopadhyay,et al.  A new multiobjective simulated annealing based clustering technique using symmetry , 2009, Pattern Recognit. Lett..

[42]  Ujjwal Maulik,et al.  A new line symmetry distance based automatic clustering technique: Application to image segmentation , 2011, Int. J. Imaging Syst. Technol..

[43]  Sanghamitra Bandyopadhyay,et al.  A new multiobjective clustering technique based on the concepts of stability and symmetry , 2010, Knowledge and Information Systems.

[44]  Joshua D. Knowles,et al.  Evidence Accumulation in Multiobjective Data Clustering , 2013, EMO.

[45]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[46]  Weiguo Sheng,et al.  A weighted sum validity function for clustering with a hybrid niching genetic algorithm , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[47]  Bruce W. Weide,et al.  Optimal Expected-Time Algorithms for Closest Point Problems , 1980, TOMS.

[48]  Asif Ekbal,et al.  A new semi-supervised clustering technique using multi-objective optimization , 2015, Applied Intelligence.

[49]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[50]  Brian Everitt,et al.  Cluster analysis , 1974 .

[51]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[52]  Alvaro Garcia-Piquer,et al.  Large-Scale Experimental Evaluation of Cluster Representations for Multiobjective Evolutionary Clustering , 2014, IEEE Transactions on Evolutionary Computation.