The p53HMM algorithm: using profile hidden markov models to detect p53-responsive genes

BackgroundA computational method (called p53HMM) is presented that utilizes Profile Hidden Markov Models (PHMMs) to estimate the relative binding affinities of putative p53 response elements (REs), both p53 single-sites and cluster-sites. These models incorporate a novel "Corresponded Baum-Welch" training algorithm that provides increased predictive power by exploiting the redundancy of information found in the repeated, palindromic p53-binding motif. The predictive accuracy of these new models are compared against other predictive models, including position specific score matrices (PSSMs, or weight matrices). We also present a new dynamic acceptance threshold, dependent upon a putative binding site's distance from the Transcription Start Site (TSS) and its estimated binding affinity. This new criteria for classifying putative p53-binding sites increases predictive accuracy by reducing the false positive rate.ResultsTraining a Profile Hidden Markov Model with corresponding positions matching a combined-palindromic p53-binding motif creates the best p53-RE predictive model. The p53HMM algorithm is available on-line: http://tools.csb.ias.eduConclusionUsing Profile Hidden Markov Models with training methods that exploit the redundant information of the homotetramer p53 binding site provides better predictive models than weight matrices (PSSMs). These methods may also boost performance when applied to other transcription factor binding sites.

[1]  Jay J. Lee,et al.  Data-Driven Design of HMM Topology for Online Handwriting Recognition , 2001, Int. J. Pattern Recognit. Artif. Intell..

[2]  Thomas Tan,et al.  p53 Binds and Activates the Xeroderma Pigmentosum DDB2 Gene in Humans but Not Mice , 2002, Molecular and Cellular Biology.

[3]  Ruth Nussinov,et al.  Sequence analysis of p53 response-elements suggests multiple binding modes of the p53 tetramer to DNA targets , 2007, Nucleic acids research.

[4]  Nir Friedman,et al.  Modeling dependencies in protein-DNA binding sites , 2003, RECOMB '03.

[5]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[6]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[7]  K. Kinzler,et al.  Definition of a consensus binding site for p53 , 1992, Nature Genetics.

[8]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[9]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[10]  Qing Zhou,et al.  Modeling within-motif dependence for transcription factor binding site predictions , 2004, Bioinform..

[11]  Alberto Riva,et al.  MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes , 2005, BMC Bioinformatics.

[12]  Richard Hughey,et al.  Scoring hidden Markov models , 1997, Comput. Appl. Biosci..

[13]  Anders Krogh,et al.  Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA , 1995, ISMB.

[14]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[15]  C. Chothia,et al.  Volume changes in protein evolution. , 1994, Journal of molecular biology.

[16]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[17]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[18]  JinHyung Kim,et al.  Data-driven Design of HMM Topology for On-line Handwriting Recognition , 2000 .

[19]  Jörg Schultz,et al.  HMM Logos for visualization of protein families , 2004, BMC Bioinformatics.

[20]  Judith Roth,et al.  A polymorphic microsatellite that mediates induction of PIG3 by p53 , 2002, Nature Genetics.

[21]  J. Ott,et al.  The p53MH algorithm and its application in detecting p53-responsive genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Anirvan M. Sengupta,et al.  A biophysical approach to transcription factor binding site discovery. , 2003, Genome research.

[23]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[24]  Julie Dawn Thompson,et al.  Improved sensitivity of profile searches through the use of sequence weights and gap excision , 1994, Comput. Appl. Biosci..

[25]  J. Shay,et al.  A transcriptionally active DNA-binding site for human p53 protein complexes , 1992, Molecular and cellular biology.

[26]  Gary D. Stormo,et al.  Neural Networks for Determining Protein Specificity and Multiple Alignment of Binding Sites , 1994, ISMB.

[27]  A. Levine p53, the Cellular Gatekeeper for Growth and Division , 1997, Cell.

[28]  Andreas Prlic,et al.  Sequence analysis , 2003 .

[29]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[30]  Todd Riley,et al.  The regulation of the endosomal compartment by p53 the tumor suppressor gene , 2009, The FEBS journal.

[31]  D. S. Fields,et al.  Quantitative specificity of the Mnt repressor. , 1997, Journal of molecular biology.

[32]  Sean R. Eddy,et al.  Biological sequence analysis: Contents , 1998 .

[33]  Eduardo Sontag,et al.  Transcriptional control of human p53-regulated genes , 2008, Nature Reviews Molecular Cell Biology.

[34]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[35]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[36]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[37]  Philippe Dessen,et al.  Further characterisation of the p53 responsive element – identification of new candidate genes for trans-activation by p53 , 1997, Oncogene.

[38]  A. Levine,et al.  p53 regulates maternal reproduction through LIF , 2007, Nature.