The Loss and Gain of Functional Amino Acid Residues Is a Common Mechanism Causing Human Inherited Disease

Elucidating the precise molecular events altered by disease-causing genetic variants represents a major challenge in translational bioinformatics. To this end, many studies have investigated the structural and functional impact of amino acid substitutions. Most of these studies were however limited in scope to either individual molecular functions or were concerned with functional effects (e.g. deleterious vs. neutral) without specifically considering possible molecular alterations. The recent growth of structural, molecular and genetic data presents an opportunity for more comprehensive studies to consider the structural environment of a residue of interest, to hypothesize specific molecular effects of sequence variants and to statistically associate these effects with genetic disease. In this study, we analyzed data sets of disease-causing and putatively neutral human variants mapped to protein 3D structures as part of a systematic study of the loss and gain of various types of functional attribute potentially underlying pathogenic molecular alterations. We first propose a formal model to assess probabilistically function-impacting variants. We then develop an array of structure-based functional residue predictors, evaluate their performance, and use them to quantify the impact of disease-causing amino acid substitutions on catalytic activity, metal binding, macromolecular binding, ligand binding, allosteric regulation and post-translational modifications. We show that our methodology generates actionable biological hypotheses for up to 41% of disease-causing genetic variants mapped to protein structures suggesting that it can be reliably used to guide experimental validation. Our results suggest that a significant fraction of disease-causing human variants mapping to protein structures are function-altering both in the presence and absence of stability disruption.

[1]  Predrag Radivojac,et al.  Automated inference of molecular mechanisms of disease from amino acid substitutions , 2009, Bioinform..

[2]  M. DePristo,et al.  Missense meanderings in sequence space: a biophysical view of protein evolution , 2005, Nature Reviews Genetics.

[3]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[4]  István A. Kovács,et al.  Widespread Macromolecular Interaction Perturbations in Human Genetic Disorders , 2015, Cell.

[5]  Dan S. Tawfik,et al.  Stability effects of mutations and protein evolvability. , 2009, Current opinion in structural biology.

[6]  Akinori Sarai,et al.  ProTherm, version 2.0: thermodynamic database for proteins and mutants , 2000, Nucleic Acids Res..

[7]  Russ B. Altman,et al.  Improving the prediction of disease-related variants using protein three-dimensional structure , 2011, BMC Bioinformatics.

[8]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[9]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[10]  Jaap Heringa,et al.  An analysis of protein domain linkers: their classification and role in protein folding. , 2002, Protein engineering.

[11]  R. Hammer,et al.  Secreted PCSK9 decreases the number of LDL receptors in hepatocytes and in livers of parabiotic mice. , 2006, The Journal of clinical investigation.

[12]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[13]  Guillaume Vogt,et al.  Gains of glycosylation comprise an unexpectedly large group of pathogenic mutations , 2005, Nature Genetics.

[14]  Massimiliano Pontil,et al.  Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods , 2009, BMC Bioinformatics.

[15]  Thomas A. Peterson,et al.  Towards precision medicine: advances in computational approaches for the analysis of human variants. , 2013, Journal of molecular biology.

[16]  Sudhir Kumar,et al.  Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations , 2012, Bioinform..

[17]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[18]  Sean D. Mooney,et al.  Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis , 2005, Briefings Bioinform..

[19]  Vasant Honavar,et al.  Predicting DNA-binding sites of proteins from amino acid sequence , 2006, BMC Bioinformatics.

[20]  Piero Fariselli,et al.  Correlating disease‐related mutations to their effect on protein stability: A large‐scale analysis of the human proteome , 2011, Human mutation.

[21]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[22]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[23]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[24]  C. Fierke,et al.  Structure-assisted redesign of a protein-zinc-binding site with femtomolar affinity. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Fuxiao Xin Methods for predicting functional residues in protein structures and understanding molecular mechanisms of disease , 2012 .

[26]  Steven Myers,et al.  Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease , 2010, BMC Bioinformatics.

[27]  Orla Hardiman,et al.  “True” sporadic ALS associated with a novel SOD‐1 mutation , 2002, Annals of neurology.

[28]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[29]  J. Ippolito,et al.  Structure of an engineered His3Cys zinc binding site in human carbonic anhydrase II. , 1993, Biochemistry.

[30]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[31]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[32]  Guillaume Vogt,et al.  Gain-of-glycosylation mutations. , 2007, Current opinion in genetics & development.

[33]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[34]  Predrag Radivojac,et al.  Gain and Loss of Phosphorylation Sites in Human Cancer , 2022 .

[35]  Bengt-Harald Jonsson,et al.  Organization of an efficient carbonic anhydrase: implications for the mechanism based on structure-function studies of a T199P/C206S mutant. , 2002, Biochemistry.

[36]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[37]  Lilia M. Iakoucheva,et al.  Loss of Post-Translational Modification Sites in Disease , 2010, Pacific Symposium on Biocomputing.

[38]  Burkhard Rost,et al.  Correlating protein function and stability through the analysis of single amino acid substitutions , 2009, BMC Bioinformatics.

[39]  Predrag Radivojac,et al.  Computational methods for identification of functional residues in protein structures. , 2011, Current protein & peptide science.

[40]  R. Nussinov,et al.  Allosteric post-translational modification codes. , 2012, Trends in biochemical sciences.

[41]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[42]  R. Mazumder,et al.  Human germline and pan-cancer variomes and their distinct functional profiles , 2014, Nucleic acids research.

[43]  S. Teichmann,et al.  Tight Regulation of Unstructured Proteins: From Transcript Synthesis to Protein Degradation , 2008, Science.

[44]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[45]  J. Moult,et al.  Loss of protein structure stability as a major causative factor in monogenic disease. , 2005, Journal of molecular biology.

[46]  Martha White,et al.  Nonparametric semi-supervised learning of class proportions , 2016, ArXiv.

[47]  Masashi Sugiyama,et al.  Class Prior Estimation from Positive and Unlabeled Data , 2014, IEICE Trans. Inf. Syst..

[48]  Joaquín Dopazo,et al.  SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants , 2011, Nucleic Acids Res..

[49]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[50]  Predrag Radivojac,et al.  The structural and functional signatures of proteins that undergo multiple events of post‐translational modification , 2014, Protein science : a publication of the Protein Society.

[51]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[52]  M. Sternberg,et al.  The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. , 2013, Journal of molecular biology.

[53]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[54]  Shuai Li,et al.  ASD v2.0: updated content and novel features focusing on allosteric regulation , 2013, Nucleic Acids Res..

[55]  P. Hart,et al.  Structures of mouse SOD1 and human/mouse SOD1 chimeras. , 2010, Archives of biochemistry and biophysics.

[56]  J. Thornton,et al.  Molecular basis of inherited diseases: a structural perspective. , 2003, Trends in genetics : TIG.

[57]  Christopher T. Saunders,et al.  Evaluation of structural and evolutionary contributions to deleterious mutation prediction. , 2002, Journal of molecular biology.

[58]  Predrag Radivojac,et al.  Generalized graphlet kernels for probabilistic inference in sparse graphs , 2014, Network Science.

[59]  Gajendra P. S. Raghava,et al.  ccPDB: compilation and creation of data sets from Protein Data Bank , 2012, Nucleic Acids Res..

[60]  A Liljas,et al.  Structural analysis of the zinc hydroxide–Thr‐199–Glu‐106 hydrogen‐bond network in human carbonic anhydrase II , 1993, Proteins.

[61]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[62]  K. Nagano Logical analysis of the mechanism of protein folding. I. Predictions of helices, loops and beta-structures from primary structure. , 1973, Journal of molecular biology.

[63]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[64]  Predrag Radivojac,et al.  Influence of Sequence Changes and Environment on Intrinsically Disordered Proteins , 2009, PLoS Comput. Biol..

[65]  Predrag Radivojac,et al.  Intrinsic Disorder and Prote in Modifications: Building an SVM Predictor for Methylation , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[66]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[67]  V. Vacic,et al.  Identification, analysis, and prediction of protein ubiquitination sites , 2010, Proteins.

[68]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[69]  Rachel Karchin,et al.  Next generation tools for the annotation of human SNPs , 2009, Briefings Bioinform..

[70]  P. Bourne,et al.  Exploiting sequence and structure homologs to identify protein–protein binding sites , 2005, Proteins.

[71]  P. Karplus,et al.  Structural characterization of zinc-deficient human superoxide dismutase and implications for ALS. , 2007, Journal of molecular biology.

[72]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[73]  Tu Vinh Luong,et al.  A novel SOD1-ALS mutation separates central and peripheral effects of mutant SOD1 toxicity , 2014, Human molecular genetics.

[74]  C. Fierke,et al.  Structural and functional importance of a conserved hydrogen bond network in human carbonic anhydrase II. , 1993, The Journal of biological chemistry.

[75]  U. Krishnan,et al.  Novel Mutations that Enhance or Repress the Aggregation Potential of SOD1 , 2006, Molecular and Cellular Biochemistry.

[76]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[77]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[78]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..