Neural Networks for Determining Protein Specificity and Multiple Alignment of Binding Sites

We use a quantitative definition of specificity to develop a neural network for the identification of common protein binding sites in a collection of unaligned DNA fragments. We demonstrate the equivalence of the method to maximizing Information Content of the aligned sites when simple models of the binding energy and the genome are employed. The network method subsumes those simple models and is capable of working with more complicated ones. This is demonstrated using a Markov model of the E. coli genome and a sampling method to approximate the partition function. A variation of Gibbs' sampling aids in avoiding local minima.

[1]  D. Haussler,et al.  Stochastic context-free grammars for modeling RNA , 1993, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[2]  G. Stormo,et al.  Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. , 1992, Journal of molecular biology.

[3]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[5]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[6]  G D Stormo,et al.  Probing information content of DNA-binding sites. , 1991, Methods in enzymology.

[7]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[8]  David Haussler,et al.  Stochastic Context-Free Grammars for Modeling RN , 1994, HICSS.

[9]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[10]  R. Ivarie,et al.  Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. , 1987, Nucleic acids research.

[11]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[13]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[14]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[15]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[16]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[17]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .