SCENIC: Single-cell regulatory network inference and clustering

Single-cell RNA-seq allows building cell atlases of any given tissue and infer the dynamics of cellular state transitions during developmental or disease trajectories. Both the maintenance and transitions of cell states are encoded by regulatory programs in the genome sequence. However, this regulatory code has not yet been exploited to guide the identification of cellular states from single-cell RNA-seq data. Here we describe a computational resource, called SCENIC (Single Cell rEgulatory Network Inference and Clustering), for the simultaneous reconstruction of gene regulatory networks (GRNs) and the identification of stable cell states, using single-cell RNA-seq data. SCENIC outperforms existing approaches at the level of cell clustering and transcription factor identification. Importantly, we show that cell state identification based on GRNs is robust towards batch-effects and technical-biases. We applied SCENIC to a compendium of single-cell data from the mouse and human brain and demonstrate that the proper combinations of transcription factors, target genes, enhancers, and cell types can be identified. Moreover, we used SCENIC to map the cell state landscape in melanoma and identified a gene regulatory network underlying a proliferative melanoma state driven by MITF and STAT and a contrasting network controlling an invasive state governed by NFATC2 and NFIB. We further validated these predictions by showing that two transcription factors are predominantly expressed in early metastatic sentinel lymph nodes. In summary, SCENIC is the first method to analyze scRNA-seq data using a network-centric, rather than cell-centric approach. SCENIC is generic, easy to use, and flexible, and allows for the simultaneous tracing of genomic regulatory programs and the mapping of cellular identities emerging from these programs. Availability: SCENIC is available as an R workflow based on three new R/Bioconductor packages: GENIE3, RcisTarget and AUCell. As scalable alternative to GENIE3, we also provide GRNboost, paving the way towards the network analysis across millions of single cells.

[1]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[2]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[3]  S. Richardson,et al.  Beyond comparisons of means: understanding changes in gene expression at the single-cell level , 2016, Genome Biology.

[4]  S. Linnarsson,et al.  Single-cell genomics: coming of age , 2016, Genome Biology.

[5]  A. Oudenaarden,et al.  Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences , 2008, Cell.

[6]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[7]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[8]  Maria K. Jaakkola,et al.  Comparison of methods to detect differentially expressed genes between single-cell populations , 2016, Briefings Bioinform..

[9]  M. Choo,et al.  NFATc1 mediates HDAC-dependent transcriptional repression of osteocalcin expression during osteoblast differentiation. , 2009, Bone.

[10]  Martin Hemberg,et al.  Discrete distributional differential expression (D3E) - a tool for gene expression analysis of single-cell RNA-seq data , 2015, BMC Bioinformatics.

[11]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[12]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[13]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[14]  Tomasz Arodz,et al.  ENNET: inferring large gene regulatory networks from expression data using gradient boosting , 2013, BMC Systems Biology.

[15]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[16]  Judith A. Blake,et al.  Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse , 2016, Nucleic Acids Res..

[17]  Sevil Oskay Halacli,et al.  FOXP1 enhances tumor cell migration by repression of NFAT1 transcriptional activity in MDA-MB-231 cells. , 2016 .

[18]  Tuan Nguyen,et al.  NFAT-3 Is a Transcriptional Repressor of the Growth-associated Protein 43 during Neuronal Maturation* , 2009, The Journal of Biological Chemistry.

[19]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[20]  Robert Damoiseaux,et al.  Interferon Receptor Signaling Pathways Regulating PD-L1 and PD-L2 Expression , 2017, Cell reports.

[21]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[22]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[23]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[24]  M. Lavail,et al.  Rods and cones in the mouse retina. I. Structural analysis using light and electron microscopy , 1979, The Journal of comparative neurology.

[25]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[26]  E. Davidson The Regulatory Genome: Gene Regulatory Networks In Development And Evolution , 2006 .

[27]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[28]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[29]  Stein Aerts,et al.  iRegulon: From a Gene List to a Gene Regulatory Network Using Large Motif and Track Collections , 2014, PLoS Comput. Biol..

[30]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[31]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[32]  Stein Aerts,et al.  i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly , 2015, Nucleic Acids Res..

[33]  Hakon Hakonarson,et al.  Comprehensive analysis of gene expression in human retina and supporting tissues , 2014, Human molecular genetics.

[34]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[35]  Y. Saeys,et al.  Computational methods for trajectory inference from single‐cell transcriptomics , 2016, European journal of immunology.

[36]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[37]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[38]  Bo Wang,et al.  SIMLR: a tool for large-scale single-cell analysis by multi-kernel learning , 2017, bioRxiv.

[39]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[40]  Chris Woolston Potential flaws in genomics paper scrutinized on Twitter , 2015, Nature.

[41]  Stein Aerts,et al.  Decoding transcriptional states in cancer. , 2017, Current opinion in genetics & development.

[42]  Jens Hjerling-Leffler,et al.  Oligodendrocyte heterogeneity in the mouse juvenile and adult central nervous system , 2016, Science.

[43]  S. Aerts,et al.  i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules , 2012, Nucleic acids research.

[44]  M. Cugmas,et al.  On comparing partitions , 2015 .

[45]  Alicia N. Schep,et al.  Nfib Promotes Metastasis through a Widespread Increase in Chromatin Accessibility , 2016, Cell.

[46]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[47]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[48]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[49]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[50]  Marie Perier-Muzet,et al.  ZEB1‐mediated melanoma cell plasticity enhances resistance to MAPK inhibitors , 2016, EMBO molecular medicine.

[51]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[52]  M. Ronaghi,et al.  Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain , 2016, Science.

[53]  Eugenia G. Giannopoulou,et al.  NFIB is a governor of epithelial–melanocyte stem cell behaviour in a shared niche , 2013, Nature.

[54]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[55]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[56]  Eric M. Morrow,et al.  rax, Hes1, and notch1 Promote the Formation of Müller Glia by Postnatal Retinal Progenitor Cells , 2000, Neuron.

[57]  Bjørn Tore Gjertsen,et al.  Axl is an essential epithelial-to-mesenchymal transition-induced regulator of breast cancer metastasis and patient survival , 2009, Proceedings of the National Academy of Sciences.

[58]  A. Anichini,et al.  NFATc2 is an intrinsic regulator of melanoma dedifferentiation , 2016, Oncogene.

[59]  Fabian J. Theis,et al.  Diffusion maps for high-dimensional single-cell analysis of differentiation data , 2015, Bioinform..

[60]  Mingming Jia,et al.  COSMIC: somatic cancer genetics at high-resolution , 2016, Nucleic Acids Res..

[61]  Mariella G. Filbin,et al.  Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma , 2016, Nature.

[62]  Elizabeth M. Simpson,et al.  Nr2e1 regulates retinal lamination and the development of Müller glia, S-cones, and glycineric amacrine cells during retinogenesis , 2015, Molecular Brain.

[63]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[64]  Carsten Peterson,et al.  Single-Cell Network Analysis Identifies DDIT3 as a Nodal Lineage Regulator in Hematopoiesis , 2015, Cell reports.

[65]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[66]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[67]  I. Amit,et al.  Tissue-Resident Macrophage Enhancer Landscapes Are Shaped by the Local Microenvironment , 2014, Cell.

[68]  Manolis Kellis,et al.  Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease , 2015, Nature.

[69]  Fabian J. Theis,et al.  destiny: diffusion maps for large-scale single-cell data in R , 2015, Bioinform..

[70]  Sara Ballouz,et al.  Exploiting single-cell expression to characterize co-expression replicability , 2016, Genome Biology.

[71]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[72]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[73]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[74]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[75]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[76]  Stein Aerts,et al.  Robust Target Gene Discovery through Transcriptome Perturbations and Genome-Wide Enhancer Predictions in Drosophila Uncovers a Regulatory Basis for Sensory Specification , 2010, PLoS biology.

[77]  Joseph L. Herman,et al.  Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis , 2015, Nature Methods.

[78]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, bioRxiv.

[79]  Eric H. Davidson,et al.  Gene Regulatory Networks for Development: What They Are, How They Work, and What They Mean , 2006 .

[80]  R. Masland,et al.  The Major Cell Populations of the Mouse Retina , 1998, The Journal of Neuroscience.

[81]  M. Czyz,et al.  MITF in melanoma: mechanisms behind its expression and activity , 2014, Cellular and Molecular Life Sciences.

[82]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[83]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[84]  Ash A. Alizadeh,et al.  Toward understanding and exploiting tumor heterogeneity , 2015, Nature Medicine.

[85]  J. E. Richardson,et al.  MouseMine: a new data warehouse for MGI , 2015, Mammalian Genome.

[86]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[87]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[88]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[89]  P. Schummer,et al.  Specific c-Jun target genes in malignant melanoma , 2016, Cancer biology & therapy.

[90]  Staci A. Sorensen,et al.  Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics , 2016 .

[91]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[92]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[93]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[94]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[95]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[96]  Sumiko Watanabe,et al.  The group E Sox genes Sox8 and Sox9 are regulated by Notch signaling and are required for Müller glial cell development in mouse retina. , 2009, Experimental eye research.

[97]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[98]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[99]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[100]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[101]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[102]  A. Rao,et al.  The NFAT1 Transcription Factor is a Repressor of Cyclin A2 Gene Expression , 2007, Cell cycle.