Computationally Mapping Sequence Space To Understand Evolutionary Protein Engineering

Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure‐based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non‐intuitive relationship between the error‐prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub‐like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error‐prone PCR and genetic recombination procedures.

[1]  Marc Ostermeier,et al.  A combinatorial approach to hybrid enzymes independent of DNA homology , 1999, Nature Biotechnology.

[2]  A. Wong,et al.  Evidence for structural constraint on ovulin, a rapidly evolving Drosophila melanogaster seminal protein , 2006, Proceedings of the National Academy of Sciences.

[3]  Andreas Vogel,et al.  Iterative saturation mutagenesis on the basis of B factors as a strategy for increasing protein thermostability. , 2006, Angewandte Chemie.

[4]  Frances H. Arnold,et al.  Molecular evolution by staggered extension process (StEP) in vitro recombination , 1998, Nature Biotechnology.

[5]  Brian Kuhlman,et al.  Computer-based design of novel protein structures. , 2006, Annual review of biophysics and biomolecular structure.

[6]  N. Wingreen,et al.  Emergence of Preferred Structures in a Simple Model of Protein Folding , 1996, Science.

[7]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  K. Dill,et al.  Theory for protein mutability and biogenesis. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[10]  Marc Ostermeier,et al.  Finding Cinderella's slipper—proteins that fit , 1999, Nature Biotechnology.

[11]  D. Baker,et al.  Prediction and design of macromolecular structures and interactions , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[12]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[13]  J W Szostak,et al.  RNA-peptide fusions for the in vitro selection of peptides and proteins. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Marco A Mena,et al.  Blue fluorescent proteins with enhanced brightness and photostability from a structurally targeted library , 2006, Nature Biotechnology.

[15]  W. Stemmer Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[16]  Philip T. Pienkos,et al.  DNA shuffling method for generating highly recombined genes and evolved enzymes , 2001, Nature Biotechnology.

[17]  Christoph Adami,et al.  Stability and the evolvability of function in a model protein. , 2004, Biophysical journal.

[18]  Patrik Samuelson,et al.  Display of proteins on bacteria. , 2002, Journal of biotechnology.

[19]  Manfred T Reetz,et al.  Controlling the enantioselectivity of enzymes by directed evolution: practical and theoretical ramifications. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Cameron Neylon,et al.  Chemical and biochemical strategies for the randomization of protein encoding DNA sequences: library construction methods for directed evolution. , 2004, Nucleic acids research.

[21]  J. Reymond,et al.  Novel methods for biocatalyst screening. , 2001, Current opinion in chemical biology.

[22]  B. Rupp,et al.  Structure of bovine pancreatic trypsin inhibitor at 125 K definition of carboxyl-terminal residues Gly57 and Ala58. , 1996, Acta crystallographica. Section D, Biological crystallography.

[23]  Vijay S Pande,et al.  Thoroughly sampling sequence space: Large‐scale protein design of structural ensembles , 2002, Protein science : a publication of the Protein Society.

[24]  Stephen J Benkovic,et al.  FamClash: A method for ranking the activity of engineered enzymes , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  K D Wittrup,et al.  Yeast surface display for directed evolution of protein expression, affinity, and stability. , 2000, Methods in enzymology.

[26]  B. Glick,et al.  Rapidly maturing variants of the Discosoma red fluorescent protein (DsRed) , 2002, Nature Biotechnology.

[27]  D. Wigley,et al.  The third IgG-binding domain from streptococcal protein G. An analysis by X-ray crystallography of the structure alone and in a complex with Fab. , 1994, Journal of molecular biology.

[28]  C. Maranas,et al.  IPRO: an iterative computational protein library redesign and optimization procedure. , 2006, Biophysical journal.

[29]  C D Maranas,et al.  Modeling DNA mutation and recombination for directed evolution experiments. , 2000, Journal of theoretical biology.

[30]  Dan S. Tawfik,et al.  Man-made cell-like compartments for molecular evolution , 1998, Nature Biotechnology.

[31]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[32]  Patrice Koehl,et al.  Protein topology and stability define the space of allowed sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  G. Weiss,et al.  Optimizing the affinity and specificity of proteins with molecular display. , 2006, Molecular bioSystems.

[34]  F. Arnold,et al.  Evolving strategies for enzyme engineering. , 2005, Current opinion in structural biology.

[35]  Tony Hunter,et al.  Structural basis for phosphoserine-proline recognition by group IV WW domains , 2000, Nature Structural Biology.

[36]  Christopher A. Voigt,et al.  Computationally focusing the directed evolution of proteins , 2001, Journal of cellular biochemistry. Supplement.

[37]  W. Lim,et al.  Alternative packing arrangements in the hydrophobic core of λrepresser , 1989, Nature.

[38]  G. Georgiou,et al.  High-throughput screening of enzyme libraries. , 2000, Current opinion in biotechnology.

[39]  Stephen L. Mayo,et al.  Design, structure and stability of a hyperthermophilic protein variant , 1998, Nature Structural Biology.

[40]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[41]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[42]  B. Seed Purification of genomic sequences from bacteriophage libraries by recombination and selection in vivo. , 1983, Nucleic acids research.

[43]  Frances H. Arnold,et al.  Computational method to reduce the search space for directed protein evolution , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[44]  G. Georgiou,et al.  Antibody affinity maturation using bacterial surface display. , 1998, Protein engineering.

[45]  A. Plückthun,et al.  In vitro display technologies: novel developments and applications. , 2001, Current opinion in biotechnology.

[46]  Manfred T Reetz,et al.  Assembly of Designed Oligonucleotides as an Efficient Method for Gene Recombination: A New Tool in Directed Evolution , 2003, Chembiochem : a European journal of chemical biology.

[47]  E. Bornberg-Bauer,et al.  Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[48]  F. Arnold Design by Directed Evolution , 1998 .

[49]  K. Sharp,et al.  Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models , 1994 .

[50]  Frances H Arnold,et al.  SCHEMA-guided protein recombination. , 2004, Methods in enzymology.

[51]  Loren L Looger,et al.  Computational Design of a Biologically Active Enzyme , 2004, Science.

[52]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[53]  W. Stemmer DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[54]  R. Tsien,et al.  A monomeric red fluorescent protein , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[55]  F. Raushel,et al.  Enhanced degradation of chemical warfare agents through molecular engineering of the phosphotriesterase active site. , 2003, Journal of the American Chemical Society.

[56]  F. Arnold,et al.  Directed evolution converts subtilisin E into a functional equivalent of thermitase. , 1999, Protein engineering.

[57]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[58]  L. H. Bradley,et al.  Protein design by binary patterning of polar and nonpolar amino acids. , 1993, Methods in molecular biology.

[59]  M. Levitt,et al.  Funnel‐like organization in sequence space determines the distributions of protein stability and folding rate preferred by evolution , 2004, Proteins.

[60]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[61]  Narendra Maheshri,et al.  Computational and experimental analysis of DNA shuffling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[62]  M. Deem,et al.  A hierarchical approach to protein molecular evolution. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Ichiro Matsumura,et al.  A comparison of directed evolution approaches using the beta-glucuronidase model system. , 2003, Journal of molecular biology.

[64]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[65]  G. Georgiou,et al.  Quantitative analysis of the effect of the mutation frequency on the affinity maturation of single chain Fv antibodies. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[66]  N. Wingreen,et al.  The designability of protein structures. , 2001, Journal of molecular graphics & modelling.

[67]  B. Dahiyat,et al.  Combining computational and experimental screening for rapid optimization of protein properties , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Bruce Tidor,et al.  Rational design of new binding specificity by simultaneous mutagenesis of calmodulin and a target peptide. , 2006, Biochemistry.

[69]  A. Plückthun,et al.  In vitro selection and evolution of functional proteins by using ribosome display. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[70]  Stephen L Mayo,et al.  Electrostatics in computational protein design. , 2005, Current opinion in chemical biology.

[71]  F. Arnold,et al.  Protein stability promotes evolvability. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[72]  M. Zaccolo,et al.  The effect of high-frequency random mutagenesis on in vitro protein evolution: a study on TEM-1 beta-lactamase. , 1999, Journal of molecular biology.