Deep Generative Models for Detecting Differential Expression in Single Cells

Detecting differentially expressed genes is important for characterizing subpopulations of cells. However, in scRNA-seq data, nuisance variation due to technical factors like sequencing depth and RNA capture efficiency obscures the underlying biological signal. First, we show that deep generative models, which combined Bayesian statistics and deep neural networks, better estimate the log-fold-change in gene expression levels between subpopulations of cells. Second, we use Bayesian decision theory to detect differentially expressed genes while controlling the false discovery rate. Our experiments on simulated and real datasets show that our approach out-performs state-of-the-art DE frameworks. Finally, we introduce a technique for improving the posterior approximation, and show that it also improves differential expression performance.

[1]  James O. Berger,et al.  Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[2]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[3]  J. Gribben,et al.  Chronic lymphocytic leukemia cells induce changes in gene expression of CD4 and CD8 T cells. , 2005, The Journal of clinical investigation.

[4]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[5]  Eva K. Lee,et al.  Systems Biology of Seasonal Influenza Vaccination in Humans , 2011, Nature Immunology.

[6]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[7]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[10]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[11]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[12]  Subharup Guha,et al.  hmmSeq: A hidden Markov model for detecting differentially expressed genes from RNA-seq data , 2015, 1509.04838.

[13]  Greg Finak,et al.  MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA-seq data , 2015 .

[14]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[15]  Max Welling,et al.  Improving Variational Autoencoders with Inverse Autoregressive Flow , 2016, NIPS.

[16]  Barbara Di Camillo,et al.  Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods , 2017, Front. Genet..

[17]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[18]  Mark D. Robinson,et al.  Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data , 2017, bioRxiv.

[19]  S. Richardson,et al.  Correcting the Mean-Variance Dependency for Differential Variability Testing Using Single-Cell RNA Sequencing Data , 2018, Cell systems.

[20]  Nir Yosef,et al.  SymSim: simulating multi-faceted variability in single cell RNA sequencing , 2018, bioRxiv.

[21]  Stefano Ermon,et al.  Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.

[22]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[24]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[25]  Nir Yosef,et al.  Simulating multiple faceted variability in single cell RNA sequencing , 2019, Nature Communications.