Adaptive Noise Immune Cluster Ensemble Using Affinity Propagation

Cluster ensemble is one of the main branches in the ensemble learning area which is an important research focus in recent years. The objective of cluster ensemble is to combine multiple clustering solutions in a suitable way to improve the quality of the clustering result. In this paper, we design a new noise immune cluster ensemble framework named as AP2CE to tackle the challenges raised by noisy datasets. AP2CE not only takes advantage of the affinity propagation algorithm (AP) and the normalized cut algorithm (Ncut), but also possesses the characteristics of cluster ensemble. Compared with traditional cluster ensemble approaches, AP2CE is characterized by several properties. (1) It adopts multiple distance functions instead of a single Euclidean distance function to avoid the noise related to the distance function. (2) AP2CE applies AP to prune noisy attributes and generate a set of new datasets in the subspaces consists of representative attributes obtained by AP. (3) It avoids the explicit specification of the number of clusters. (4) AP2CE adopts the normalized cut algorithm as the consensus function to partition the consensus matrix and obtain the final result. In order to improve the performance of AP2CE, the adaptive AP2CE is designed, which makes use of an adaptive process to optimize a newly designed objective function. The experiments on both synthetic and real datasets show that (1) AP2CE works well on most of the datasets, in particular the noisy datasets; (2) AP2CE is a better choice for most of the datasets when compared with other cluster ensemble approaches; (3) AP2CE has the capability to provide more accurate, stable and robust results.

[1]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[2]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008, Stat. Anal. Data Min..

[3]  Zhiwen Yu,et al.  Ensemble based 3D human motion classification , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[4]  Ioannis T. Christou,et al.  Coordination of Cluster Ensembles via Exact Methods , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yun Yang,et al.  Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations , 2011, IEEE Transactions on Knowledge and Data Engineering.

[6]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[8]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[9]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[10]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[11]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[14]  Tsaipei Wang,et al.  CA-Tree: A Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Jill P. Mesirov,et al.  Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets , 2007, PloS one.

[16]  Tossapon Boongoen,et al.  LCE: a link-based cluster ensemble method for improved gene expression data analysis , 2010, Bioinform..

[17]  Zhiwen Yu,et al.  Knowledge Based Cluster Ensemble for Cancer Discovery From Biomolecular Data , 2011, IEEE Transactions on NanoBioscience.

[18]  Hau-San Wong,et al.  ARImp: A Generalized Adjusted Rand Index for Cluster Ensembles , 2010, 2010 20th International Conference on Pattern Recognition.

[19]  Jane You,et al.  Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2009, Stat. Anal. Data Min..

[21]  Jane You,et al.  SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles , 2012, TCBB.

[22]  Bing Li,et al.  Efficient Clustering Aggregation Based on Data Fragments , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Haytham Elghazel,et al.  Feature Selection for Unsupervised Learning Using Random Cluster Ensembles , 2010, 2010 IEEE International Conference on Data Mining.

[24]  Yunjun Gao,et al.  Hybrid clustering solution selection strategy , 2014, Pattern Recognit..

[25]  Selim Mimaroglu,et al.  DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Xiaohua Hu,et al.  Microarray Gene Cluster Identification and Annotation Through Cluster Ensemble and EM-Based Informative Textual Summarization , 2009, IEEE Transactions on Information Technology in Biomedicine.

[28]  Tossapon Boongoen,et al.  Link-based cluster ensembles for heterogeneous biological data analysis , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[29]  Li Guo,et al.  Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams , 2010, 2010 IEEE International Conference on Data Mining.

[30]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[31]  Jane You,et al.  Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yong Chen,et al.  Automatic malware categorization using cluster ensemble , 2010, KDD.

[34]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[35]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[36]  Jill P. Mesirov,et al.  A resampling-based method for class discovery and visualization of gene expression microarray data , 2003 .

[37]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[38]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Fang Liu,et al.  Spectral Clustering Ensemble Applied to SAR Image Segmentation , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[40]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[41]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Abdolreza Mirzaei,et al.  A Novel Hierarchical-Clustering-Combination Scheme Based on Fuzzy-Similarity Relations , 2010, IEEE Transactions on Fuzzy Systems.

[43]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[44]  Zhiwen Yu,et al.  Fuzzy cluster ensemble and its application on 3D head model classification , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[45]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[46]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Lawrence O. Hall,et al.  Dominant Sets as a Framework for Cluster Ensembles: An Evolutionary Game Theory Approach , 2014, 2014 22nd International Conference on Pattern Recognition.

[48]  Chris H. Q. Ding,et al.  Hierarchical Ensemble Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[49]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[51]  Constantine Kotropoulos,et al.  Speaker Diarization Exploiting the Eigengap Criterion and Cluster Ensembles , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  Joachim M. Buhmann,et al.  Combining partitions by probabilistic label aggregation , 2005, KDD '05.

[53]  Ashfaqur Rahman,et al.  Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning , 2012, IEEE Transactions on Knowledge and Data Engineering.

[54]  Yunjun Gao,et al.  Probabilistic cluster structure ensemble , 2014, Inf. Sci..

[55]  Zhiwen Yu,et al.  Class Discovery From Gene Expression Data Based on Perturbation and Cluster Ensemble , 2009, IEEE Transactions on NanoBioscience.

[56]  Witold Pedrycz,et al.  Granular Computing: Analysis and Design of Intelligent Systems , 2013 .