Protein Folding in Simplified Models With Estimation of Distribution Algorithms

Simplified lattice models have played an important role in protein structure prediction and protein folding problems. These models can be useful for an initial approximation of the protein structure, and for the investigation of the dynamics that govern the protein folding process. Estimation of distribution algorithms (EDAs) are efficient evolutionary algorithms that can learn and exploit the search space regularities in the form of probabilistic dependencies. This paper introduces the application of different variants of EDAs to the solution of the protein structure prediction problem in simplified models, and proposes their use as a simulation tool for the analysis of the protein folding process. We develop new ideas for the application of EDAs to the bidimensional and tridimensional (2-d and 3-d) simplified protein folding problems. This paper analyzes the rationale behind the application of EDAs to these problems, and elucidates the relationship between our proposal and other population-based approaches proposed for the protein folding problem. We argue that EDAs are an efficient alternative for many instances of the protein structure prediction problem and are indeed appropriate for a theoretical analysis of search procedures in lattice models. All the algorithms introduced are tested on a set of difficult 2-d and 3-d instances from lattice models. Some of the results obtained with EDAs are superior to the ones obtained with other well-known population-based optimization algorithms.

[1]  Ján Manuch,et al.  Structure-Approximating Inverse Protein Folding Problem in the 2D HP Model , 2005, J. Comput. Biol..

[2]  William E. Hart,et al.  Protein structure prediction with evolutionary algorithms , 1999 .

[3]  G. A. Lazar,et al.  De novo design of the hydrophobic core of ubiquitin , 1997, Protein science : a publication of the Protein Society.

[4]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[5]  Pedro Larrañaga,et al.  Combining variable neighborhood search and estimation of distribution algorithms in the protein side chain placement problem , 2007, J. Heuristics.

[6]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[7]  Edmund K. Burke,et al.  Multimeme Algorithms for Protein Structure Prediction , 2002, PPSN.

[8]  Bart Naudts,et al.  A comparison of predictive measures of problem difficulty in evolutionary algorithms , 2000, IEEE Trans. Evol. Comput..

[9]  Jim Smith,et al.  Study of fitness landscapes for the HP model of protein structure prediction , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[10]  Holger H. Hoos,et al.  An Ant Colony Optimization Algorithm for the 2D HP Protein Folding Problem , 2002, Ant Algorithms.

[11]  William E. Hart,et al.  Fast protein folding in the hydrophobic-hydrophilic model within three-eights of optimal , 1995, STOC '95.

[12]  Yvan Saeys,et al.  Feature selection for splice site prediction: A new method using EDA-based feature ranking , 2004, BMC Bioinformatics.

[13]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[14]  H. Wako,et al.  Analyses of simulations of three-dimensional lattice proteins in comparison with a simplified statistical mechanical model of protein folding. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  J. Hirst,et al.  The evolutionary landscape of functional model proteins. , 1999, Protein engineering.

[16]  T. Dandekar,et al.  Improving genetic algorithms for protein folding simulations by systematic crossover. , 1999, Bio Systems.

[17]  Sue Whitesides,et al.  A complete and effective move set for simplified protein folding , 2003, RECOMB '03.

[18]  Holger H. Hoos,et al.  An Improved Ant Colony Optimisation Algorithm for the 2D HP Protein Folding Problem , 2003, Canadian Conference on AI.

[19]  Pedro Larrañaga,et al.  An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[20]  C. Reeves,et al.  Properties of fitness functions and search landscapes , 2001 .

[21]  Yvan Saeys,et al.  Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction , 2003, ECCB.

[22]  Cecilia Clementi,et al.  Quantifying the roughness on the free energy landscape: entropic bottlenecks and protein folding rates. , 2004, Journal of the American Chemical Society.

[23]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[24]  Pedro Larrañaga,et al.  The Role of a Priori Information in the Minimization of Contact Potentials by Means of Estimation of Distribution Algorithms , 2007, EvoBIO.

[25]  Carlos Cotta,et al.  Protein Structure Prediction Using Evolutionary Algorithms Hybridized with Backtracking , 2009, IWANN.

[26]  V. Pande,et al.  Pathways for protein folding: is a new view needed? , 1998, Current opinion in structural biology.

[27]  Erich Bornberg-Bauer,et al.  Perspectives on protein evolution from simple exact models. , 2002, Applied bioinformatics.

[28]  Vincenzo Cutello,et al.  Immune Algorithms with Aging Operators for the String Folding Problem and the Protein Folding Problem , 2005, EvoCOP.

[29]  Vincenzo Cutello,et al.  An Immune Algorithm for Protein Structure Prediction on Lattice Models , 2007, IEEE Transactions on Evolutionary Computation.

[30]  Peter V. Coveney,et al.  Protein Structure Prediction as a Hard Optimization Problem: The Genetic Algorithm Approach , 1997, physics/9708012.

[31]  P. Grassberger,et al.  Growth algorithms for lattice heteropolymers at low temperatures , 2002, cond-mat/0208042.

[32]  P. Grassberger Sequential Monte Carlo Methods for Protein Folding , 2004, cond-mat/0408571.

[33]  Pedro Larrañaga,et al.  Side chain placement using estimation of distribution algorithms , 2007, Artif. Intell. Medicine.

[34]  Richard Bonneau,et al.  Contact order and ab initio protein structure prediction , 2002, Protein science : a publication of the Protein Society.

[35]  Ken Dill,et al.  A tabu search strategy for finding low energy structures of proteins in HP - model , 2004 .

[36]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[37]  Christopher A. Voigt,et al.  Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. , 2000, Journal of molecular biology.

[38]  Hsiao-Ping Hsu,et al.  Growth-based optimization algorithm for lattice heteropolymers. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  James Philbin,et al.  Fast tree search for enumeration of a lattice model of protein folding , 2001, The Journal of Chemical Physics.

[40]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[41]  W. Wong,et al.  Evolutionary Monte Carlo for protein folding simulations , 2001 .

[42]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[43]  Thilo Mahnig,et al.  Evolutionary Synthesis of Bayesian Networks for Optimization , 1999 .

[44]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[45]  Natalio Krasnogor,et al.  Multimeme Algorithms Using Fuzzy Logic Based Memes For Protein Structure Prediction , 2005 .

[46]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[47]  J. Onuchic,et al.  The energy landscape theory of protein folding: insights into folding mechanisms and scenarios. , 2000, Advances in protein chemistry.

[48]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[49]  Jim Smith,et al.  The Co-Evolution of Memetic Algorithms for Protein Structure Prediction , 2005 .

[50]  Vincenzo Cutello,et al.  An immune algorithm with hyper-macromutations for the Dill's 2D hydrophobic-hydrophilic model , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[51]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[52]  K. Dill,et al.  A fast conformational search strategy for finding low energy structures of model proteins , 1996, Protein science : a publication of the Protein Society.

[53]  A. Schug,et al.  An evolutionary strategy for all-atom folding of the 60-amino-acid bacterial ribosomal protein l20. , 2006, Biophysical journal.

[54]  J. Onuchic,et al.  Theory of Protein Folding This Review Comes from a Themed Issue on Folding and Binding Edited Basic Concepts Perfect Funnel Landscapes and Common Features of Folding Mechanisms , 2022 .

[55]  Max Henrion,et al.  Propagating uncertainty in bayesian networks by probabilistic logic sampling , 1986, UAI.

[56]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[57]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[58]  Jiaxing Cheng,et al.  A Novel Genetic Algorithm for HP Model Protein Folding , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[59]  U H Hansmann,et al.  New Monte Carlo algorithms for protein folding. , 1999, Current opinion in structural biology.

[60]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[61]  D. Baker,et al.  A surprising simplicity to protein folding , 2000, Nature.

[62]  Madhu Chetty,et al.  A new guided genetic algorithm for 2D hydrophobic-hydrophilic model to predict protein folding , 2005, 2005 IEEE Congress on Evolutionary Computation.

[63]  Wing Hung Wong,et al.  A study of density of states and ground states in hydrophobic-hydrophilic protein folding models by equi-energy sampling. , 2006, The Journal of chemical physics.

[64]  Erich Bornberg-Bauer,et al.  Recombinatoric exploration of novel folded structures: A heteropolymer-based model of protein evolutionary landscapes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[65]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[66]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[67]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[68]  Hsiao-Ping Hsu,et al.  Structure optimization in an off-lattice protein model. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[69]  J Moult,et al.  Protein folding simulations with genetic algorithms and a detailed molecular description. , 1997, Journal of molecular biology.

[70]  J. A. Lozano,et al.  Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms (Studies in Fuzziness and Soft Computing) , 2006 .

[71]  William E. Hart,et al.  Fast Protein Folding in the Hydrophobic-Hydrophillic Model within Three-Eights of Optimal , 1996, J. Comput. Biol..

[72]  C. Levinthal Are there pathways for protein folding , 1968 .

[73]  Pedro Larrañaga,et al.  Protein Folding in 2-Dimensional Lattices with Estimation of Distribution Algorithms , 2004, ISBMDA.

[74]  C L Brooks,et al.  Exploring the origins of topological frustration: design of a minimally frustrated model of fragment B of protein A. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[75]  S. Takada,et al.  Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[76]  R. Santana,et al.  The mixture of trees Factorized Distribution Algorithm , 2001 .

[77]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[78]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[79]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[80]  P. Grassberger,et al.  Testing a new Monte Carlo algorithm for protein folding , 1997, Proteins.

[81]  Pedro Larrañaga,et al.  Towards a New Evolutionary Computation - Advances in the Estimation of Distribution Algorithms , 2006, Towards a New Evolutionary Computation.

[82]  Charlotte M. Deane,et al.  Modelling sequential protein folding under kinetic control , 2006, ISMB.

[83]  Pedro Larrañaga,et al.  Unsupervised Learning Of Bayesian Networks Via Estimation Of Distribution Algorithms: An Application To Gene Expression Data Clustering , 2004, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[84]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[85]  G. Harik Linkage Learning via Probabilistic Modeling in the ECGA , 1999 .

[86]  Garrison W. Greenwood,et al.  On the Evolutionary Search for Solutions to the Protein Folding Problem , 2003 .

[87]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.