Categorical fuzzy k-modes clustering with automated feature weight learning

This article presents and investigates a new variant of the fuzzy k-Modes clustering algorithm for categorical data with automated feature weight learning. The modification strengthens the classical fuzzy k-Modes algorithm by associating higher weights to features which are instrumental in recognizing the clustering pattern of the data. A statistical comparison between the performances of the proposed algorithm and the conventional fuzzy k-Modes algorithm on synthetic and real world datasets, have been carried out with respect to mean values, best performance count, and medians. We take a novel approach towards the comparison of the fuzziness of the obtained clusters. To the best of our knowledge, such comparison has been reported here for the first time for the case of categorical data. The results obtained, shows that the proposed algorithm enjoys an edge over the conventional fuzzy k-Modes algorithm both in terms of Rand Index and fuzziness measures.

[1]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[2]  Michael K. Ng,et al.  On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Omar S. Soliman,et al.  A Bio Inspired Fuzzy K-Modes Clustring Algorithm , 2012, ICONIP.

[4]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[5]  Liang Bai,et al.  A dissimilarity measure for the k-Modes clustering algorithm , 2012, Knowl. Based Syst..

[6]  F. Klawonn,et al.  Fuzzy clustering with weighting of data variables , 2000 .

[7]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[8]  Miin-Shen Yang,et al.  Block fuzzy k-modes clustering algorithm , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[9]  J. Wu,et al.  A genetic fuzzy k-Modes algorithm for clustering categorical data , 2009, Expert Syst. Appl..

[10]  Doheon Lee,et al.  Fuzzy clustering of categorical data using fuzzy centroids , 2004, Pattern Recognit. Lett..

[11]  Zengyou He,et al.  Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode , 2005, CIS.

[12]  James C. Bezdek,et al.  Statistical parameters of cluster validity functionals , 1980, International Journal of Computer & Information Sciences.

[13]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jiye Liang,et al.  A weighting k-modes algorithm for subspace clustering of categorical data , 2013, Neurocomputing.

[15]  Ohn Mar San,et al.  An alternative extension of the k-means algorithm for clustering categorical data , 2004 .

[16]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[17]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[18]  Jiye Liang,et al.  The k-modes type clustering plus between-cluster information for categorical data , 2014, Neurocomputing.

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Jiye Liang,et al.  A novel fuzzy clustering algorithm with between-cluster information for categorical data , 2013, Fuzzy Sets Syst..

[21]  Ujjwal Maulik,et al.  Rough Set Based Fuzzy K-Modes for Categorical Data , 2012, SEMCCO.

[22]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .