Use of adaptive networks to define highly predictable protein secondary-structure classes

We present an adaptive, neural network method that determinesnew classes of protein secondary structure that are significantly more predictable from local amino-acid sequence than conventional classifications. Accurate prediction of the conventional secondary-structure classes, alpha-helix, beta-strand, and coil, from primary sequence has long been an important problem in computational molecular biology, with many ramifications, including multiple-sequence alignment, prediction of functionally important regions of proteins, and prediction of tertiary structure from primary sequence. The algorithm presented here uses adaptive networks to simultaneously examine both sequence and structure data, as available from, for example, the Brookhaven Protein Database, and to determine new secondary-structure classes that can be predicted from sequence with high accuracy. These new classes have both similarities to, and differences from, conventional secondary-structure classes. They represent a new, nontrivial classification of protein secondary structure that is predictable from primary sequence.

[1]  Jürgen Schmidhuber,et al.  Discovering Predictable Classifications , 1993, Neural Computation.

[2]  Lawrence Hunter,et al.  Bayesian classification of protein structure , 1992, IEEE Expert.

[3]  Virginia R. de Sa,et al.  Learning Classification with Unlabeled Data , 1993, NIPS.

[4]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[5]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[6]  B Efron,et al.  Statistical Data Analysis in the Computer Age , 1991, Science.

[7]  A. Lapedes,et al.  Application of neural networks and other machine learning algorithms to DNA sequence analysis , 1988 .

[8]  M. Perutz,et al.  New X-Ray Evidence on the Configuration of Polypeptide Chains: Polypeptide Chains in Poly-γ-benzyl-L-glutamate, Keratin and Hæmoglobin , 1951, Nature.

[9]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[10]  William H. Press,et al.  Numerical recipes in C , 2002 .

[11]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[12]  Stephen H. Bryant,et al.  Collection and standardization of crystal structure data by the Protein Data Bank , 1987 .

[13]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[14]  Robert M. Farber,et al.  Neural Network Definition of Highly Predictable Protein Secondary Structure Classes , 1993, NIPS.

[15]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of Molecular Biology.

[16]  Helen Suzanna Becker,et al.  An information-theoretic unsupervised learning algorithm for neural networks , 1993 .

[17]  S J Prestrelski,et al.  Generation of a substructure library for the description and classification of protein secondary structure. I. Overview of the methods and results , 1992, Proteins.

[18]  Jude W. Shavlik,et al.  Using Knowledge-Based Neural Networks to Improve Algorithms: Refining the Chou–Fasman Algorithm for Protein Folding , 2004, Machine Learning.

[19]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[20]  Walter M. Fitch,et al.  A non-sequential method for constructing trees and hierarchical classifications , 2005, Journal of Molecular Evolution.

[21]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[22]  Alain Hénaut,et al.  Merging of distance matrices and classification by dynamic clustering , 1988, Comput. Appl. Biosci..

[23]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[24]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[25]  Lawrence Hunter,et al.  Efficient Classification of Massive, Unsegmented Datastreams , 1992, ML.

[26]  Georg E. Schulz,et al.  Prediction of Secondary Structure from the Amino Acid Sequence , 1979 .

[27]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[28]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[29]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[30]  David Waltz,et al.  Developing hierarchical representations for protein structures: an incremental approach , 1993 .

[31]  L. Pauling,et al.  The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[32]  A Kolinski,et al.  Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. , 1991, Journal of molecular biology.

[33]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[34]  Alan S. Lapedes,et al.  Covariation of Mutations in the V3 Loop of HIV-1: An Information Theoretic Analysis , 1995 .

[35]  John A. Stankovic,et al.  Distributed Processing , 1978, Computer.