A review of estimation of distribution algorithms in bioinformatics

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain.

[1]  Kumara Sastry,et al.  Linkage Learning via Probabilistic Modeling in the Extended Compact Genetic Algorithm (ECGA) , 2006, Scalable Optimization via Probabilistic Modeling.

[2]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[3]  Roberto Santana,et al.  Estimation of Distribution Algorithms with Kikuchi Approximations , 2005, Evolutionary Computation.

[4]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[5]  J. A. Lozano,et al.  Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms (Studies in Fuzziness and Soft Computing) , 2006 .

[6]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Xavier Llorà,et al.  Automated alphabet reduction method with evolutionary algorithms for protein structure prediction , 2007, GECCO '07.

[8]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[9]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[10]  Pedro Larrañaga,et al.  Feature subset selection by genetic algorithms and estimation of distribution algorithms - A case study in the survival of cirrhotic patients treated with TIPS , 2001, Artif. Intell. Medicine.

[11]  Armando Blanco,et al.  Obtaining Biclusters in Microarrays with Population-Based Heuristics , 2006, EvoWorkshops.

[12]  Carlos Cano,et al.  EVOLUTIONARY ALGORITHMS FOR FINDING INTERPRETABLE PATTERNS IN GENE EXPRESSION DATA , 2006 .

[13]  Chao Dai,et al.  Inducing Pairwise Gene Interactions from Time-Series Data by EDA Based Bayesian Network , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[14]  Vasant Honavar,et al.  Evolutionary Synthesis of Bayesian Networks for Optimization , 2001 .

[15]  Endika Bengoetxea,et al.  Inexact Graph Matching Using Estimation of Distribution Algorithms , 2002 .

[16]  Thilo Mahnig,et al.  Evolutionary Synthesis of Bayesian Networks for Optimization , 1999 .

[17]  Concha Bielza,et al.  Estimation of Distribution Algorithms as Logistic Regression Regularizers of Microarray Classifiers , 2009, Methods of Information in Medicine.

[18]  Martin Pelikan,et al.  Hierarchical Bayesian optimization algorithm: toward a new generation of evolutionary algorithms , 2010, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[19]  R. Santana,et al.  The mixture of trees Factorized Distribution Algorithm , 2001 .

[20]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Xavier Llorà,et al.  ENPDA: an evolutionary structure-based de novo peptide design algorithm , 2005, J. Comput. Aided Mol. Des..

[22]  Pedro Larrañaga,et al.  Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS , 2001, AIME.

[23]  Yvan Saeys,et al.  Feature Ranking Using an EDA-based Wrapper Approach , 2006, Towards a New Evolutionary Computation.

[24]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[25]  Rajkumar Roy,et al.  Advances in Soft Computing: Engineering Design and Manufacturing , 1998 .

[26]  Pedro Larrañaga,et al.  Protein Folding in Simplified Models With Estimation of Distribution Algorithms , 2008, IEEE Transactions on Evolutionary Computation.

[27]  Roberto Santana Hermida Advances in probabilistic graphical models for optimisation and learning. Applications in protein modeling , 2006 .

[28]  Shumeet Baluja,et al.  Using Optimal Dependency-Trees for Combinational Optimization , 1997, ICML.

[29]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[30]  David E. Goldberg,et al.  Linkage Problem, Distribution Estimation, and Bayesian Networks , 2000, Evolutionary Computation.

[31]  Pedro Larrañaga,et al.  Feature subset selection by Bayesian networks: a comparison with genetic and sequential algorithms , 2001, Int. J. Approx. Reason..

[32]  Pedro Larrañaga,et al.  Towards a New Evolutionary Computation - Advances in the Estimation of Distribution Algorithms , 2006, Towards a New Evolutionary Computation.

[33]  P. Schleyer Encyclopedia of computational chemistry , 1998 .

[34]  David E. Goldberg,et al.  Using Previous Models to Bias Structural Learning in the Hierarchical BOA , 2008, Evolutionary Computation.

[35]  Jose Miguel Puerta,et al.  EDNA: Estimation of Dependency Networks Algorithm , 2007, IWINAC.

[36]  J. Hirst,et al.  The evolutionary landscape of functional model proteins. , 1999, Protein engineering.

[37]  Pedro Larrañaga,et al.  Side chain placement using estimation of distribution algorithms , 2007, Artif. Intell. Medicine.

[38]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[39]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[40]  H. Iba,et al.  Gene selection for classification of cancers using probabilistic model building genetic algorithm. , 2005, Bio Systems.

[41]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[42]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[43]  Pedro Larrañaga,et al.  The Impact of Exact Probabilistic Learning Algorithms in EDAs Based on Bayesian Networks , 2008, Linkage in Evolutionary Computation.

[44]  Yvan Saeys,et al.  Feature selection for splice site prediction: A new method using EDA-based feature ranking , 2004, BMC Bioinformatics.

[45]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[46]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[47]  Heinz Mühlenbein,et al.  A Factorized Distribution Algorithm Using Single Connected Bayesian Networks , 2000, PPSN.

[48]  William H. Majoros,et al.  Methods for computational gene prediction , 2007 .

[49]  Boris Steipe,et al.  Protein Design Concepts , 2002 .

[50]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[51]  Pedro Larrañaga,et al.  The Role of a Priori Information in the Minimization of Contact Potentials by Means of Estimation of Distribution Algorithms , 2007, EvoBIO.

[52]  Pedro Larrañaga,et al.  Unsupervised Learning Of Bayesian Networks Via Estimation Of Distribution Algorithms: An Application To Gene Expression Data Clustering , 2004, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[53]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[54]  Martin Pelikan,et al.  Scalable Optimization via Probabilistic Modeling: From Algorithms to Applications (Studies in Computational Intelligence) , 2006 .

[55]  Yvan Saeys,et al.  Feature Selection for Classification of Nucleic Acid Sequences , 2004 .

[56]  Pedro Larrañaga,et al.  Combining variable neighborhood search and estimation of distribution algorithms in the protein side chain placement problem , 2007, J. Heuristics.

[57]  ROSA BLANCO,et al.  Gene Selection For Cancer Classification Using Wrapper Approaches , 2004, Int. J. Pattern Recognit. Artif. Intell..

[58]  Michèle Sebag,et al.  Extending Population-Based Incremental Learning to Continuous Search Spaces , 1998, PPSN.

[59]  David E. Goldberg,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms , 2002 .

[60]  Siddhartha Shakya,et al.  Optimization by estimation of distribution with DEUM framework based on Markov random fields , 2007, Int. J. Autom. Comput..

[61]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[62]  Pedro Larrañaga,et al.  Protein Folding in 2-Dimensional Lattices with Estimation of Distribution Algorithms , 2004, ISBMDA.

[63]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[64]  Pedro Larrañaga,et al.  Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers , 2008, Comput. Methods Programs Biomed..

[65]  Pedro Larrañaga,et al.  Bioinformatics Advance Access published August 24, 2007 A review of feature selection techniques in bioinformatics , 2022 .

[66]  Dirk Thierens,et al.  Linkage Information Processing In Distribution Estimation Algorithms , 1999, GECCO.

[67]  G. Harik Linkage Learning via Probabilistic Modeling in the ECGA , 1999 .

[68]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[69]  Yvan Saeys,et al.  Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction , 2003, ECCB.

[70]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[71]  Martin Pelikan,et al.  Analyzing probabilistic models in hierarchical BOA on traps and spin glasses , 2007, GECCO '07.

[72]  M. Pelikán,et al.  The Bivariate Marginal Distribution Algorithm , 1999 .

[73]  Hitoshi Iba,et al.  Identification of Informative Genes for Molecular Classification Using Probabilistic Model Building Genetic Algorithm , 2004, GECCO.