Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities

MOTIVATION Inference of latent chemical species in biochemical interaction networks is a key problem in estimation of the structure and parameters of the genetic, metabolic and protein interaction networks that underpin all biological processes. We present a framework for Bayesian marginalization of these latent chemical species through Gaussian process priors. RESULTS We demonstrate our general approach on three different biological examples of single input motifs, including both activation and repression of transcription. We focus in particular on the problem of inferring transcription factor activity when the concentration of active protein cannot easily be measured. We show how the uncertainty in the inferred transcription factor activity can be integrated out in order to derive a likelihood function that can be used for the estimation of regulatory model parameters. An advantage of our approach is that we avoid the use of a coarsegrained discretization of continuous time functions, which would lead to a large number of additional parameters to be estimated. We develop exact (for linear regulation) and approximate (for non-linear regulation) inference schemes, which are much more efficient than competing sampling-based schemes and therefore provide us with a practical toolkit for model-based inference. AVAILABILITY The software and data for recreating all the experiments in this paper is available in MATLAB from http://www.cs.man. ac.uk/~neill/gpsim.

[1]  Douglas B. Kell,et al.  Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation , 1998, Bioinform..

[2]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[3]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Nir Friedman,et al.  Inferring quantitative models of regulatory networks from expression data , 2004, ISMB/ECCB.

[5]  Neil D. Lawrence,et al.  A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips , 2005, Bioinform..

[6]  M. Barenco,et al.  Ranked prediction of p53 targets using hidden variable dynamic modeling , 2006, Genome Biology.

[7]  Neil D. Lawrence,et al.  A probabilistic dynamical model for quantitative inference of the regulatory mechanism of transcription , 2006, Bioinform..

[8]  Matthew C. Coleman,et al.  Bayesian parameter estimation with informative priors for nonlinear systems , 2006 .

[9]  Neil D. Lawrence,et al.  Modelling transcriptional regulation using Gaussian Processes , 2006, NIPS.

[10]  Michael P. Eichenlaub,et al.  A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. , 2006, Developmental cell.

[11]  V. Vinciotti,et al.  Reconstructing repressor protein levels from expression of gene targets in Escherichia coli , 2006, Proceedings of the National Academy of Sciences.

[12]  Juho Rousu,et al.  Probabilistic modeling and machine learning in structural and systems biology , 2007, BMC Bioinformatics.

[13]  Mark A. Girolami,et al.  Bayesian ranking of biochemical system models , 2008, Bioinform..

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.