Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities

MOTIVATION Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques, such as Chromatin Immunoprecipitation (ChIP) provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data. RESULTS We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast datasets in which the network structure has previously been obtained using ChIP data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell. AVAILABILITY MATLAB code is available from http://umber.sbs.man.ac.uk/resources/puma

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[3]  Feng Gao,et al.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data , 2004, BMC Bioinformatics.

[4]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[5]  Chiara Sabatti,et al.  Bayesian sparse hidden components analysis for transcription regulation networks , 2005, Bioinform..

[6]  J. Pronk,et al.  Contribution of the Saccharomyces cerevisiae transcriptional regulator Leu3p to physiology and gene expression in nitrogen- and carbon-limited chemostat cultures. , 2005, FEMS yeast research.

[7]  Erich Bornberg-Bauer,et al.  Rapid motif-based prediction of circular permutations in multi-domain proteins , 2005, Bioinform..

[8]  L. Johnston,et al.  Swi5 controls a novel wave of cyclin synthesis in late mitosis. , 1998, Molecular biology of the cell.

[9]  A. Boulesteix,et al.  Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach , 2005, Theoretical Biology and Medical Modelling.

[10]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[11]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[12]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..

[13]  Stephen Oliver,et al.  Genome-wide analysis of the context-dependence of regulatory networks , 2005, Genome Biology.

[14]  Ferez S. Nallaseth,et al.  Yeast Rmi1/Nce4 Controls Genome Stability as a Subunit of the Sgs1-Top3 Complex , 2005, Molecular and Cellular Biology.

[15]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[16]  Neil D. Lawrence,et al.  Accounting for probe-level noise in principal component analysis of microarray data , 2005, Bioinform..

[17]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[18]  Kenneth Lange,et al.  Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites , 2005, Bioinform..

[19]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  José M. F. Moura,et al.  Block matrices with L-block-banded inverse: inversion algorithms , 2005, IEEE Transactions on Signal Processing.

[21]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[22]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[23]  Neil D. Lawrence,et al.  A probabilistic dynamical model for quantitative inference of the regulatory mechanism of transcription , 2006, Bioinform..

[24]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[25]  Nir Friedman,et al.  Inferring quantitative models of regulatory networks from expression data , 2004, ISMB/ECCB.

[26]  K Nasmyth,et al.  EGT2 gene transcription is induced predominantly by Swi5 in early G1 , 1996, Molecular and cellular biology.