Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction

Computational approaches to de novo protein tertiary structure prediction, including those based on the preeminent “fragment-assembly” technique, have failed to scale up fully to larger proteins (on the order of 100 residues and above). A number of limiting factors are thought to contribute to the scaling problem over and above the simple combinatorial explosion, but the key ones relate to the lack of exploration of properly diverse protein folds, and to an acute form of “deception” in the energy function, whereby low-energy conformations do not reliably equate with native structures. In this article, solutions to both of these problems are investigated through a multistage memetic algorithm incorporating the successful Rosetta method as a local search routine. We found that specialised genetic operators significantly add to structural diversity and that this translates well to reaching low energies. The use of a generalised stochastic ranking procedure for selection enables the memetic algorithm to handle and traverse deep energy wells that can be considered deceptive, which further adds to the ability of the algorithm to obtain a much-improved diversity of folds. The results should translate to a tangible improvement in the performance of protein structure prediction algorithms in blind experiments such as CASP, and potentially to a further step towards the more challenging problem of predicting the three-dimensional shape of large proteins.

[1]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[2]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[3]  Jasmine L. Gallaher,et al.  Computational Design of an Enzyme Catalyst for a Stereoselective Bimolecular Diels-Alder Reaction , 2010, Science.

[4]  Carlos Cotta,et al.  Memetic algorithms and memetic computing optimization: A literature review , 2012, Swarm Evol. Comput..

[5]  C. Dobson,et al.  Protein misfolding, functional amyloid, and human disease. , 2006, Annual review of biochemistry.

[6]  Pablo Moscato,et al.  A Gentle Introduction to Memetic Algorithms , 2003, Handbook of Metaheuristics.

[7]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[10]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[11]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[12]  Amarda Shehu,et al.  A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction , 2013, BMC Structural Biology.

[13]  Krzysztof Fidelis,et al.  CASP10 results compared to those of previous CASP experiments , 2014, Proteins.

[14]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[15]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[16]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[17]  Pablo Moscato,et al.  On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts : Towards Memetic Algorithms , 1989 .

[18]  Timothy A. Whitehead,et al.  Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing , 2012, Nature Biotechnology.

[19]  Carlos A. Coello Coello,et al.  Constraint-handling in nature-inspired numerical optimization: Past, present and future , 2011, Swarm Evol. Comput..

[20]  D. Baker,et al.  A simple physical model for the prediction and design of protein-DNA interactions. , 2004, Journal of molecular biology.

[21]  D. E. Goldberg,et al.  Simple Genetic Algorithms and the Minimal, Deceptive Problem , 1987 .

[22]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[23]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[24]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[25]  N. Hansen,et al.  Markov Chain Analysis of Cumulative Step-Size Adaptation on a Linear Constrained Problem , 2015, Evolutionary Computation.

[26]  Marjan Mernik,et al.  Exploration and exploitation in evolutionary algorithms: A survey , 2013, CSUR.

[27]  Krzysztof Fidelis,et al.  CASP9 results compared to those of previous casp experiments , 2011, Proteins.

[28]  Kam Y. J. Zhang,et al.  Efficient Sampling in Fragment-Based Protein Structure Prediction Using an Estimation of Distribution Algorithm , 2013, PloS one.

[29]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[30]  Hongjun Bai,et al.  Assessment of template‐free modeling in CASP10 and ROLL , 2014, Proteins.

[31]  D T Jones,et al.  Prediction of novel and analogous folds using fragment assembly and fold recognition , 2005, Proteins.

[32]  David Baker,et al.  Prediction of structures of zinc‐binding proteins through explicit modeling of metal coordination geometry , 2010, Protein science : a publication of the Protein Society.

[33]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[34]  Julia Handl,et al.  Toward a detailed understanding of search trajectories in fragment assembly approaches to protein structure prediction , 2016, Proteins.

[35]  Xin Yao,et al.  Stochastic ranking for constrained evolutionary optimization , 2000, IEEE Trans. Evol. Comput..

[36]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[37]  David E. Kim,et al.  Sampling bottlenecks in de novo protein structure prediction. , 2009, Journal of molecular biology.

[38]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[39]  Amarda Shehu,et al.  Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab Initio Protein Structure Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  David Baker,et al.  The dual role of fragments in fragment‐assembly methods for de novo protein structure prediction , 2012, Proteins.

[41]  K. Misura,et al.  PROTEINS: Structure, Function, and Bioinformatics 59:15–29 (2005) Progress and Challenges in High-Resolution Refinement of Protein Structure Models , 2022 .

[42]  Kam Y. J. Zhang,et al.  A Probabilistic Fragment-Based Protein Structure Prediction Algorithm , 2012, PloS one.

[43]  Hussein A. Abbass,et al.  Sub-structural niching in estimation of distribution algorithms , 2005, GECCO '05.

[44]  Julian Lee,et al.  PROTEINS: Structure, Function, and Bioinformatics 56:704–714 (2004) Prediction of Protein Tertiary Structure Using PROFESY, a Novel Method Based on Fragment Assembly and , 2022 .

[45]  D Baker,et al.  Global properties of the mapping between local amino acid sequence and local structure in proteins. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[46]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[47]  Ponnuthurai N. Suganthan,et al.  Real-parameter evolutionary multimodal optimization - A survey of the state-of-the-art , 2011, Swarm Evol. Comput..

[48]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[49]  David E. Goldberg,et al.  Construction of high-order deceptive functions using low-order Walsh coefficients , 1992, Annals of Mathematics and Artificial Intelligence.

[50]  Daniel W. Kulp,et al.  Generalized Fragment Picking in Rosetta: Design, Protocols and Applications , 2011, PloS one.

[51]  Kenneth A. De Jong,et al.  Off-lattice protein structure prediction with homologous crossover , 2013, GECCO '13.

[52]  Kalyanmoy Deb,et al.  Massive Multimodality, Deception, and Genetic Algorithms , 1992, PPSN.

[53]  B. Stoddard,et al.  Design, activity, and structure of a highly specific artificial endonuclease. , 2002, Molecular cell.

[54]  Ofer M. Shir,et al.  Adaptive Niche Radii and Niche Shapes Approaches for Niching with the CMA-ES , 2010, Evolutionary Computation.

[55]  William E. Hart,et al.  Memetic Evolutionary Algorithms , 2005 .

[56]  V. Pande,et al.  Simulated tempering yields insight into the low‐resolution Rosetta scoring functions , 2009, Proteins.

[57]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[58]  Jordan B. Pollack,et al.  Modeling Building-Block Interdependency , 1998, PPSN.

[59]  D. Baker,et al.  Computation-Guided Backbone Grafting of a Discontinuous Motif onto a Protein Scaffold , 2011, Science.

[60]  Oliver Brock,et al.  Guiding conformation space search with an all‐atom energy potential , 2008, Proteins.

[61]  Rhiju Das,et al.  Four Small Puzzles That Rosetta Doesn't Solve , 2011, PloS one.

[62]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[63]  Amarda Shehu,et al.  Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration , 2010, Int. J. Robotics Res..

[64]  Joshua D. Knowles,et al.  Using Machine Learning to Explore the Relevance of Local and Global Features During Conformational Search in Rosetta , 2015, GECCO.

[65]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[66]  Jens Meiler,et al.  ROSETTALIGAND: Protein–small molecule docking with full side‐chain flexibility , 2006, Proteins.