Complex Networks Govern Coiled-Coil Oligomerization – Predicting and Profiling by Means of a Machine Learning Approach

Understanding the relationship between protein sequence and structure is one of the great challenges in biology. In the case of the ubiquitous coiled-coil motif, structure and occurrence have been described in extensive detail, but there is a lack of insight into the rules that govern oligomerization, i.e. how many α-helices form a given coiled coil. To shed new light on the formation of two- and three-stranded coiled coils, we developed a machine learning approach to identify rules in the form of weighted amino acid patterns. These rules form the basis of our classification tool, PrOCoil, which also visualizes the contribution of each individual amino acid to the overall oligomeric tendency of a given coiled-coil sequence. We discovered that sequence positions previously thought irrelevant to direct coiled-coil interaction have an undeniable impact on stoichiometry. Our rules also demystify the oligomerization behavior of the yeast transcription factor GCN4, which can now be described as a hybrid—part dimer and part trimer—with both theoretical and experimental justification.

[1]  R. Fisher 019: On the Interpretation of x2 from Contingency Tables, and the Calculation of P. , 1922 .

[2]  F. Crick,et al.  Is α-Keratin a Coiled Coil? , 1952, Nature.

[3]  L. Pauling,et al.  Compound Helical Configurations of Polypeptide Chains: Structure of Proteins of the α-Keratin Type , 1953, Nature.

[4]  H. Leopold,et al.  The determination of the partial specific volume of proteins by the mechanical oscillator technique. , 1973, Methods in enzymology.

[5]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  James C. Hu,et al.  Sequence requirements for coiled-coils: analysis with lambda repressor-GCN4 leucine zipper fusions. , 1990, Science.

[8]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[9]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10]  P. S. Kim,et al.  A switch between two-, three-, and four-stranded coiled coils in GCN4 leucine zipper mutants. , 1993, Science.

[11]  James C. Hu,et al.  Probing the roles of residues at the e and g positions of the GCN4 leucine zipper by combinatorial mutagenesis , 1993, Protein science : a publication of the Protein Society.

[12]  P. S. Kim,et al.  Peptide ‘Velcro’: Design of a heterodimeric coiled coil , 1993, Current Biology.

[13]  S. Sheriff,et al.  Human mannose-binding protein carbohydrate recognition domain trimerizes through a triple α-helical coiled-coil , 1994, Nature Structural Biology.

[14]  I. Kashparov,et al.  Synthesis and properties of the peptide corresponding to the mutant form of the leucine zipper of the transcriptional activator GCN4 from yeast. , 1994, Protein engineering.

[15]  B. Berger,et al.  Predicting coiled coils by use of pairwise residue correlations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[16]  D. Woolfson,et al.  Predicting oligomerization states of coiled coils , 1995, Protein science : a publication of the Protein Society.

[17]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[18]  B. Berger,et al.  MultiCoil: A program for predicting two‐and three‐stranded coiled coils , 1997, Protein science : a publication of the Protein Society.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[21]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[22]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[23]  J Walshaw,et al.  Socket: a program for identifying and analysing coiled-coil motifs within protein structures. , 2001, Journal of molecular biology.

[24]  P. Burkhard,et al.  Coiled coils: a highly versatile protein folding motif. , 2001, Trends in cell biology.

[25]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[26]  M. Delorenzi,et al.  An HMM model for coiled-coil domains and a comparison with PSSM-based predictions , 2002, Bioinform..

[27]  P. Schmieder,et al.  WW domain sequence activity relationships identified using ligand recognition propensities of 42 WW domains , 2003, Protein science : a publication of the Protein Society.

[28]  Jessica H. Fong,et al.  Predicting specificity in bZIP coiled-coil protein interactions , 2004, Genome Biology.

[29]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[30]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[31]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[32]  Bernhard Schölkopf,et al.  A Primer on Kernel Methods , 2004 .

[33]  Pavlos Progias,et al.  A conserved trimerization motif controls the topology of short coiled coils. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Michael D. Miller,et al.  Covalent stabilization of coiled coils of the HIV gp41 N region yields extremely potent and broad inhibitors of viral infection. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Amy E. Keating,et al.  Paircoil2: improved prediction of coiled coils from sequence , 2006, Bioinform..

[36]  Johannes Söding,et al.  Comparative analysis of coiled-coil prediction methods. , 2006, Journal of structural biology.

[37]  Klaus Obermayer,et al.  Support Vector Machines for Dyadic Data , 2006, Neural Computation.

[38]  Carsten C. Mahrenholz,et al.  A network of coiled-coil associations derived from synthetic GCN4 leucine-zipper arrays. , 2007, Angewandte Chemie.

[39]  H. Strauss,et al.  Pharmacological interference with protein-protein interactions mediated by coiled-coil motifs. , 2008, Handbook of experimental pharmacology.

[40]  David J Stevens,et al.  Structure of influenza hemagglutinin in complex with an inhibitor of membrane fusion , 2008, Proceedings of the National Academy of Sciences.

[41]  Niels Volkmann,et al.  The structure of the C-terminal actin-binding domain of talin , 2007, The EMBO journal.

[42]  Andrei N Lupas,et al.  The long coming of computational structural biology. , 2008, Journal of structural biology.

[43]  V. Pavlovic,et al.  A fast , large-scale learning method for protein sequence classification , 2008 .

[44]  Derek N Woolfson,et al.  Preferred side-chain constellations at antiparallel coiled-coil interfaces , 2008, Proceedings of the National Academy of Sciences.

[45]  J. Stetefeld,et al.  The use of coiled-coil proteins in drug delivery systems , 2009, European Journal of Pharmacology.

[46]  Ulrich Bodenhofer,et al.  Modeling Position Specificity in Sequence Kernels by Fuzzy Equivalence Relations , 2009, IFSA/EUSFLAT Conf..

[47]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[48]  M. Steinmetz,et al.  Molecular basis of coiled-coil oligomerization-state specificity , 2010, Proceedings of the National Academy of Sciences.

[49]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.