A Knowledge Discovery of Relationships among Dataset Entities Using Optimum Hierarchical Clustering by DE Algorithm

In recent years, discovering relationships among entities and their features in a dataset has been received a great attention in data analytics. This study aims to reveal the relationships among entities in a dataset according to a specific sequence of features which are guided according to the accuracy of the hierarchical clustering made up by the features. In this paper, a new metric, called Discriminating Features based Cohesion (DFC) factor, is defined as pair-wise stickiness measure among entities which indicates their degree of attachment (i.e., cohesive force). In this direction, a new framework is proposed; which utilizes an evolutionary algorithm (i.e., DE) for the optimal discriminating feature selection and also a hierarchical clustering method for computing DFC factors. DE algorithm is employed to identify features which their clustering hierarchical tree has the maximum accuracy, then the intermediate and final DFC factors’ matrices are computed by using a hierarchical clustering of the most discriminating features. The intermediate and final DFC factors’ matrices have been utilized to discovery the knowledge among Dataset Entities including answering crucial data mining queries which cannot be answered by using a standalone clustering method. In order to conduct a case study, a real-world dataset is utilized; which contains 17 entities (i.e., countries) presented by corresponding 24 continuous features. The DE algorithm finds the most discriminating features in each step, which are eliminated for the next step to calculate a matrix of DFC factors. In the final step, the proposed method ranks the entities in terms of their DFC factor and features based on their elimination order (i.e., discrimination power).

[1]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[3]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[4]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[5]  Amit Konar,et al.  Metaheuristic Pattern Clustering – An Overview , 2009 .

[6]  Swagatam Das,et al.  Kernel-induced fuzzy clustering of image pixels with an improved differential evolution algorithm , 2010, Inf. Sci..

[7]  G. Paul,et al.  The Chronic Dependence of Popular Religiosity upon Dysfunctional Psychosociological Conditions , 2009 .

[8]  W. T. Williams,et al.  ON THE COMPARISON OF TWO CLASSIFICATIONS OF THE SAME SET OF ELEMENTS , 1971 .

[9]  Ganapati Panda,et al.  A survey on nature inspired metaheuristic algorithms for partitional clustering , 2014, Swarm Evol. Comput..

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[12]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[13]  P. Legendre,et al.  Comparison tests for dendrograms: A comparative evaluation , 1995 .

[14]  Sergei V. Trepalin,et al.  Hierarchical Clustering of Large Databases and Classification of Antibiotics at High Noise Levels , 2008, Algorithms.

[15]  F. Rohlf Methods of Comparing Classifications , 1974 .

[16]  Ujjwal Maulik,et al.  Automatic Fuzzy Clustering Using Modified Differential Evolution for Image Classification , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[17]  Mandava Rajeswari,et al.  Multi-objective nature-inspired clustering and classification techniques for image segmentation , 2011, Appl. Soft Comput..

[18]  Dinesh Kumar,et al.  Automatic cluster evolution using gravitational search algorithm and its application on image segmentation , 2014, Eng. Appl. Artif. Intell..

[19]  Sergios Theodoridis,et al.  Pattern Recognition , 1998, IEEE Trans. Neural Networks.

[20]  Wilfrido Gómez-Flores,et al.  Automatic clustering using nature-inspired metaheuristics: A survey , 2016, Appl. Soft Comput..

[21]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[22]  J. Farris On the Cophenetic Correlation Coefficient , 1969 .

[23]  Long Quan,et al.  A novel data clustering algorithm based on modified gravitational search algorithm , 2017, Eng. Appl. Artif. Intell..

[24]  Athman Bouguettaya,et al.  Efficient agglomerative hierarchical clustering , 2015, Expert Syst. Appl..

[25]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[26]  Sinan Saraçli,et al.  Comparison of hierarchical cluster analysis methods by cophenetic correlation , 2013, Journal of Inequalities and Applications.

[27]  Hong He,et al.  A two-stage genetic algorithm for automatic clustering , 2012, Neurocomputing.

[28]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .