A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression

BackgroundThe analysis of gene expression from time series underpins many biological studies. Two basic forms of analysis recur for data of this type: removing inactive (quiet) genes from the study and determining which genes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data is drawn from a time series. In this paper we propose a simple model for accounting for the underlying temporal nature of the data based on a Gaussian process.ResultsWe review Gaussian process (GP) regression for estimating the continuous trajectories underlying in gene expression time-series. We present a simple approach which can be used to filter quiet genes, or for the case of time series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankings produced by our regression framework and compare them to a recently proposed hierarchical Bayesian model for the analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showing that the proposed approach considerably outperforms the current state of the art.ConclusionsGaussian processes offer an attractive trade-off between efficiency and usability for the analysis of microarray time series. The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression. Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series.

[1]  Antti Honkela,et al.  Model-based method for transcription factor target identification with limited data , 2010, Proceedings of the National Academy of Sciences.

[2]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[3]  Aki Vehtari,et al.  Gaussian process regression with Student-t likelihood , 2009, NIPS.

[4]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[5]  T. Speed,et al.  A multivariate empirical Bayes statistic for replicated microarray time course data , 2006, math/0702685.

[6]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[7]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[8]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[9]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[10]  Neil D. Lawrence,et al.  Modelling transcriptional regulation using Gaussian Processes , 2006, NIPS.

[11]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[12]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[13]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[14]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[15]  Diego di Bernardo,et al.  Inference of gene regulatory networks and compound mode of action from time course gene expression profiles , 2006, Bioinform..

[16]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[18]  D. di Bernardo,et al.  Direct targets of the TRP63 transcription factor revealed by a combination of gene expression profiling and reverse engineering. , 2008, Genome research.

[19]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[21]  Andrew J. Millar,et al.  Reconstruction of transcriptional dynamics from gene reporter data using differential equations , 2008, Bioinform..

[22]  Marianna Pensky,et al.  BATS: a Bayesian user-friendly software for Analyzing Time Series microarray experiments , 2008, BMC Bioinformatics.

[23]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[24]  Neil D. Lawrence,et al.  Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities , 2008, ECCB.

[25]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[26]  T. Jaakkola,et al.  Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[28]  Zoubin Ghahramani,et al.  Discovering Temporal Patterns of Differential Gene Expression in Microarray Time Series , 2009, GCB.

[29]  Zoubin Ghahramani,et al.  A Robust Bayesian Two-Sample Test for Detecting Intervals of Differential Gene Expression in Microarray Time Series , 2009, RECOMB.

[30]  Ming Yuan,et al.  Flexible temporal expression profile modelling using the Gaussian process , 2006, Comput. Stat. Data Anal..

[31]  Neil D. Lawrence,et al.  Variational inference for Student-t models: Robust Bayesian interpolation and generalised component analysis , 2005, Neurocomputing.

[32]  Marianna Pensky,et al.  Statistical Applications in Genetics and Molecular Biology A Bayesian Approach to Estimation and Testing in Time-course Microarray Experiments , 2011 .

[33]  Stephen D. Bay,et al.  Temporal Aggregation Bias and Inference of Causal Regulatory Networks , 2004, J. Comput. Biol..

[34]  Paul D. W. Kirk,et al.  Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data , 2009, Bioinform..