Neural Net Representations of Empirical Protein Potentials

Recently, there has been considerable interest in deriving and applying knowledge-based, empirical potential functions for proteins. These empirical potentials have been derived from the statistics of interacting, spatially neighboring residues, as may be obtained from databases of known protein crystal structures. In this paper we employ neural networks to redefine empirical potential functions from the point of view of discrimination functions. This approach generalizes previous work, in which simple frequency counting statistics are used on a database of known protein structures. This generalization allows us to avoid restriction to strictly pairwise interactions. Instead of frequency counting to fix adjustable parameters, one now optimizes an objective function involving a neural network parameterized probability distribution. We show how our method reduces to previous work in special situations, but also allows extensions to include orders of interaction beyond pairwise interaction. Given the close packing of proteins, steric interactions etc., the inclusion of higher order interactions is critical for developing an accurate potential. A key feature in the approach we advocate is the development of a representation to describe the spatial location of interacting residues that exist in a sphere of small fixed radius around each residue. This is a "shape representation" problem that has a natural solution for the interaction neighborhoods of protein residues. We demonstrate in a series of numerical experiments that the neural network approach improves discrimination over that obtained by previous methodologies limited to pair-wise interactions.

[1]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[2]  M. Sippl Calculation of conformational ensembles from potentials of mena force , 1990 .

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[5]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[7]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[8]  P. Wolynes,et al.  Protein tertiary structure recognition using optimized Hamiltonians with local interactions. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[10]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[11]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[12]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.