Affinity propagation enhanced by estimation of distribution algorithms

Tumor classification based on gene expression data can be applied to set appropriate medical treatment according to the specific tumor characteristics. In this paper we propose the use of estimation of distribution algorithms (EDAs) to enhance the performance of affinity propagation (AP) in classification problems. AP is an efficient clustering algorithm based on message-passing methods and which automatically identifies exemplars of each cluster. We introduce an EDA-based procedure to compute the preferences used by the AP algorithm. Our results show that AP performance can be notably improved by using the introduced approach. Furthermore, we present evidence that classification of new data is improved by employing previously identified exemplars with only minor decrease in classification accuracy.

[1]  Pedro Larrañaga,et al.  Learning Factorizations in Estimation of Distribution Algorithms Using Affinity Propagation , 2010, Evolutionary Computation.

[2]  Concha Bielza,et al.  Mateda-2.0: A MATLAB package for the implementation and analysis of estimation of distribution algorithms , 2010 .

[3]  Jonathan M. Garibaldi,et al.  Cancer Profiles by Affinity Propagation , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[4]  Pedro Larrañaga,et al.  Protein Folding in Simplified Models With Estimation of Distribution Algorithms , 2008, IEEE Transactions on Evolutionary Computation.

[5]  Michele Leone,et al.  Clustering by soft-constraint affinity propagation: applications to gene-expression data , 2007, Bioinform..

[6]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[7]  S. Yuspa,et al.  Intracellular chloride channels: critical mediators of cell viability and potential targets for cancer therapy. , 2005, Current pharmaceutical design.

[8]  M. West,et al.  Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  S. Depil,et al.  Expression of a human endogenous retrovirus, HERV-K, in the blood cells of leukemia patients , 2002, Leukemia.

[10]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[11]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[12]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[13]  R. Santana,et al.  The mixture of trees Factorized Distribution Algorithm , 2001 .

[14]  M. Boguski,et al.  Classical oncogenes and tumor suppressor genes: a comparative genomics perspective. , 2000, Neoplasia.

[15]  Pankaj K. Agarwal,et al.  Exact and Approximation Algortihms for Clustering , 1997 .

[16]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[17]  G. Hannon,et al.  Intragenic mutations of CDKN2B and CDKN2A in primary human esophageal cancers. , 1995, Human molecular genetics.

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[19]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[20]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .