Analysis of a complete DNA–protein affinity landscape

Properties of biological fitness landscapes are of interest to a wide sector of the life sciences, from ecology to genetics to synthetic biology. For biomolecular fitness landscapes, the information we currently possess comes primarily from two sources: sparse samples obtained from directed evolution experiments; and more fine-grained but less authentic information from ‘in silico’ models (such as NK-landscapes). Here we present the entire protein-binding profile of all variants of a nucleic acid oligomer 10 bases in length, which we have obtained experimentally by a series of highly parallel on-chip assays. The resulting complete landscape of sequence-binding pairs, comprising more than one million binding measurements in duplicate, has been analysed statistically using a number of metrics commonly applied to synthetic landscapes. These metrics show that the landscape is rugged, with many local optima, and that this arises from a combination of experimental variation and the natural structural properties of the oligonucleotides.

[1]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[2]  B. Derrida Random-energy model: An exactly solvable model of disordered systems , 1981 .

[3]  S. Kauffman,et al.  Towards a general theory of adaptive walks on rugged landscapes. , 1987, Journal of theoretical biology.

[4]  E. D. Weinberger,et al.  The NK model of rugged fitness landscapes and its application to maturation of the immune response. , 1989, Journal of theoretical biology.

[5]  L. Gold,et al.  Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. , 1990, Science.

[6]  J. Szostak,et al.  In vitro selection of RNA molecules that bind specific ligands , 1990, Nature.

[7]  Yuval Davidor,et al.  Epistasis Variance: A Viewpoint on GA-Hardness , 1990, FOGA.

[8]  C R Woese,et al.  Architecture of ribosomal RNA: constraints on the sequence of "tetra-loops". , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[10]  P. Schuster,et al.  From sequences to shapes and back: a case study in RNA secondary structures , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[11]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[12]  Arantxa Etxeverria The Origins of Order , 1993 .

[13]  Colin R. Reeves,et al.  Epistasis in Genetic Algorithms: An Experimental Design Perspective , 1995, ICGA.

[14]  P. Schuster,et al.  Approximate scaling properties of RNA free energy landscapes. , 1996, Journal of theoretical biology.

[15]  Lee Altenberg,et al.  Fitness Distance Correlation Analysis: An Instructive Counterexample , 1997, ICGA.

[16]  M F Kubik,et al.  Oligonucleotide inhibitors of human thrombin that bind distinct epitopes. , 1997, Journal of molecular biology.

[17]  C. Voigt,et al.  Rational evolutionary design: the theory of in vitro protein evolution. , 2000, Advances in protein chemistry.

[18]  C. Reeves,et al.  Properties of fitness functions and search landscapes , 2001 .

[19]  Peter F. Stadler,et al.  Generalized Topological Spaces in Evolutionary Theory and Combinatorial Chemistry , 2002, J. Chem. Inf. Comput. Sci..

[20]  E. Weinberger,et al.  Correlated and uncorrelated fitness landscapes and how to tell the difference , 1990, Biological Cybernetics.

[21]  M. Zuker,et al.  Prediction of hybridization and melting for double-stranded nucleic acids. , 2004, Biophysical journal.

[22]  Christopher L. Warren,et al.  Defining the sequence-recognition profile of DNA-binding molecules. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Thomas Bäck,et al.  Mixed-Integer NK Landscapes , 2006, PPSN.

[24]  A. Jahangirian,et al.  Airfoil shape parameterization for optimum Navier–Stokes design with genetic algorithm , 2007 .

[25]  D. J. Kiviet,et al.  Empirical fitness landscapes reveal accessible evolutionary paths , 2007, Nature.

[26]  Neal W. Woodbury,et al.  Exploring the sequence space of a DNA aptamer using microarrays , 2007, Nucleic acids research.

[27]  D. Kell,et al.  Array-based evolution of DNA aptamers allows modelling of an explicit sequence-fitness landscape , 2008, Nucleic acids research.

[28]  D. Kell,et al.  Analysis of aptamer sequence activity relationships. , 2009, Integrative biology : quantitative biosciences from nano to macro.

[29]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.