Selective prediction of interaction sites in protein structures with THEMATICS

BackgroundMethods are now available for the prediction of interaction sites in protein 3D structures. While many of these methods report high success rates for site prediction, often these predictions are not very selective and have low precision. Precision in site prediction is addressed using Theoretical Microscopic Titration Curves (THEMATICS), a simple computational method for the identification of active sites in enzymes. Recall and precision are measured and compared with other methods for the prediction of catalytic sites.ResultsUsing a test set of 169 enzymes from the original Catalytic Residue Dataset (CatRes) it is shown that THEMATICS can deliver precise, localised site predictions. Furthermore, adjustment of the cut-off criteria can improve the recall rates for catalytic residues with only a small sacrifice in precision. Recall rates for CatRes/CSA annotated catalytic residues are 41.1%, 50.4%, and 54.2% for Z score cut-off values of 1.00, 0.99, and 0.98, respectively. The corresponding precision rates are 19.4%, 17.9%, and 16.4%. The success rate for catalytic sites is higher, with correct or partially correct predictions for 77.5%, 85.8%, and 88.2% of the enzymes in the test set, corresponding to the same respective Z score cut-offs, if only the CatRes annotations are used as the reference set. Incorporation of additional literature annotations into the reference set gives total success rates of 89.9%, 92.9%, and 94.1%, again for corresponding cut-off values of 1.00, 0.99, and 0.98. False positive rates for a 75-protein test set are 1.95%, 2.60%, and 3.12% for Z score cut-offs of 1.00, 0.99, and 0.98, respectively.ConclusionWith a preferred cut-off value of 0.99, THEMATICS achieves a high success rate of interaction site prediction, about 86% correct or partially correct using CatRes/CSA annotations only and about 93% with an expanded reference set. Success rates for catalytic residue prediction are similar to those of other structure-based methods, but with substantially better precision and lower false positive rates. THEMATICS performs well across the spectrum of E.C. classes. The method requires only the structure of the query protein as input. THEMATICS predictions may be obtained via the web from structures in PDB format at: http://pfweb.chem.neu.edu/thematics/submit.html

[1]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[2]  J. Warwicker,et al.  Calculation of the electric potential in the active site cleft due to alpha-helix dipoles. , 1982, Journal of molecular biology.

[3]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[4]  W. L. Jorgensen,et al.  The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. , 1988, Journal of the American Chemical Society.

[5]  W. Beyer CRC Standard Mathematical Tables and Formulae , 1991 .

[6]  M. Karplus,et al.  Multiple-site titration curves of proteins: an analysis of exact and approximate methods for their calculation , 1991 .

[7]  D. Bashford,et al.  Electrostatic calculations of the pKa values of ionizable groups in bacteriorhodopsin. , 1992, Journal of molecular biology.

[8]  K. Sharp,et al.  On the calculation of pKas in proteins , 1993, Proteins.

[9]  M. Gilson Multiple‐site titration and molecular modeling: Two rapid methods for computing energies and forces for ionizable groups in proteins , 1993, Proteins.

[10]  B. Honig,et al.  Environmental effects on the protonation states of active site residues in bacteriorhodopsin. , 1994, Biophysical journal.

[11]  P. Beroza,et al.  Electrostatic calculations of amino acid titration and electron transfer, Q-AQB-->QAQ-B, in the reaction center. , 1995, Biophysical journal.

[12]  A. Karshikoff A simple algorithm for the calculation of multiple site titration curves. , 1995, Protein engineering.

[13]  L. R. Scott,et al.  Electrostatics and diffusion of molecules in solution: simulations with the University of Houston Brownian dynamics program , 1995 .

[14]  Daniel Zwillinger,et al.  CRC standard mathematical tables and formulae; 30th edition , 1995 .

[15]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[16]  Michael K. Gilson,et al.  Computing ionization states of proteins with a detailed charge model , 1996, J. Comput. Chem..

[17]  D. Ringe,et al.  Locating and characterizing binding sites on proteins , 1996, Nature Biotechnology.

[18]  M. Gilson,et al.  The determinants of pKas in proteins. , 1996, Biochemistry.

[19]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[20]  M. Gilson,et al.  Computing ionization states of proteins with a detailed charge model , 1996, J. Comput. Chem..

[21]  C. Lima,et al.  Structure-based analysis of catalysis and substrate definition in the HIT protein family. , 1997, Science.

[22]  N. Guex,et al.  SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling , 1997, Electrophoresis.

[23]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[24]  G. Petsko,et al.  Crystal structures of HINT demonstrate that histidine triad proteins are GalT-related nucleotide-binding proteins , 1997, Nature Structural Biology.

[25]  E. Alexov,et al.  Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. , 1997, Biophysical journal.

[26]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[27]  G. Phillips,et al.  Crystal structures of Bacillus stearothermophilus adenylate kinase with bound Ap5A, Mg2+ Ap5A, and Mn2+ Ap5A reveal an intermediate lid position and six coordinate octahedral geometry for bound Mg2+ and Mn2+ , 1998, Proteins.

[28]  P. Schleyer Encyclopedia of computational chemistry , 1998 .

[29]  Kimmen Sjölander,et al.  Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains , 1998, ISMB.

[30]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[31]  J. Newman,et al.  Class‐directed structure determination: Foundation for a protein structure initiative , 1998, Protein science : a publication of the Protein Society.

[32]  Sung-Hou Kim Shining a light on structural genomics , 1998, Nature Structural Biology.

[33]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[34]  J. Briggs,et al.  Calculation of the pKa values for the ligands and side chains of Escherichia coli D-alanine:D-alanine ligase. , 1999, Journal of medicinal chemistry.

[35]  D. Harrison,et al.  The crystal structure of methylglyoxal synthase from Escherichia coli. , 1999, Structure.

[36]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[37]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[38]  Jaime Prilusky,et al.  Automated analysis of interatomic contacts in proteins , 1999, Bioinform..

[39]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[40]  J. Skolnick,et al.  Structure‐based functional motif identifies a potential disulfide oxidoreductase active site in the serine/threonine protein phosphatase‐1 subfamily , 1999, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[41]  G. Montelione,et al.  A banner year for membranes , 1999, Nature Structural Biology.

[42]  J. Moult,et al.  Biological function made crystal clear - annotation of hypothetical proteins via structural genomics. , 2000, Current opinion in biotechnology.

[43]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[44]  J. Skolnick,et al.  From genes to protein structure and function: novel applications of computational approaches in the genomic era. , 2000, Trends in biotechnology.

[45]  D. Harrison,et al.  Mirroring perfection: the structure of methylglyoxal synthase complexed with the competitive inhibitor 2-phosphoglycolate. , 2000, Biochemistry.

[46]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[47]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[48]  J. Skolnick,et al.  Access the most recent version at doi: 10.1110/ps.49201 References , 2000 .

[49]  M. Ondrechen,et al.  THEMATICS: A simple computational predictor of enzyme function from structure , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[50]  H. Wolfson,et al.  Protein functional epitopes: hot spots, dynamics and combinatorial libraries. , 2001, Current opinion in structural biology.

[51]  A. Tropsha,et al.  Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. , 2001, Journal of molecular biology.

[52]  A. Elcock Prediction of functionally important residues based solely on the computed energetics of protein structure. , 2001, Journal of molecular biology.

[53]  G. T. Marks,et al.  Mechanistic implications of methylglyoxal synthase complexed with phosphoglycolohydroxamic acid as observed by X-ray crystallography and NMR spectroscopy. , 2001, Biochemistry.

[54]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[55]  P. Babbitt,et al.  Superfamily Analysis: Understanding Protein Function from Structure and Sequence , 2002 .

[56]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[57]  M. Ondrechen,et al.  Protein structure to function: insights from computation , 2004, Cellular and Molecular Life Sciences CMLS.

[58]  Karl H. Clodfelter,et al.  Identification of substrate binding sites in enzymes by computational solvent mapping. , 2003, Journal of molecular biology.

[59]  Jie Liang,et al.  CASTp: Computed Atlas of Surface Topography of proteins , 2003, Nucleic Acids Res..

[60]  Pengyu Y. Ren,et al.  Polarizable Atomic Multipole Water Model for Molecular Mechanics Simulation , 2003 .

[61]  J. Warwicker,et al.  Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. , 2004, Journal of molecular biology.

[62]  Ihsan A. Shehadi,et al.  Future directions in protein function prediction , 2002, Molecular Biology Reports.

[63]  Ying Wei,et al.  Physicochemical Methods for Prediction of Functional Information for Proteins , 2004 .

[64]  G. T. Marks,et al.  Mutagenic studies on histidine 98 of methylglyoxal synthase: effects on mechanism and conformational change. , 2004, Biochemistry.

[65]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[66]  Gil Amitai,et al.  Network analysis of protein structures identifies functional residues. , 2004, Journal of molecular biology.

[67]  A. Panchenko,et al.  Prediction of functional sites by analysis of sequence and structure conservation , 2004, Protein science : a publication of the Protein Society.

[68]  C. Innis,et al.  Prediction of functional sites in proteins using conserved functional group analysis. , 2004, Journal of molecular biology.

[69]  R. Greaves,et al.  Active site identification through geometry-based and sequence profile-based calculations: burial of catalytic clefts. , 2005, Journal of molecular biology.

[70]  Ronald J. Williams,et al.  Statistical criteria for the identification of protein active sites using theoretical microscopic titration curves , 2005, Proteins.

[71]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[72]  M. Eisenstein,et al.  Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces. , 2005, Journal of molecular biology.

[73]  Ying Wei,et al.  Prediction of active sites for protein structures from computed chemical properties , 2005, ISMB.

[74]  Ying Wei,et al.  Active Site Prediction for Comparative Model Structures with Thematics , 2005, J. Bioinform. Comput. Biol..

[75]  P. Koehl Electrostatics calculations: latest methodological advances. , 2006, Current opinion in structural biology.

[76]  M. Wall,et al.  Interactions in native binding sites cause a large change in protein dynamics. , 2006, Journal of molecular biology.

[77]  Cathy H. Wu,et al.  Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties , 2006, BMC Bioinformatics.

[78]  Karl H. Clodfelter,et al.  Computational solvent mapping reveals the importance of local conformational changes for broad substrate specificity in mammalian cytochromes P450. , 2006, Biochemistry.