Unsupervised Gene Network Inference with Decision Trees and Random Forests.

In this chapter, we introduce the reader to a popular family of machine learning algorithms, called decision trees. We then review several approaches based on decision trees that have been developed for the inference of gene regulatory networks (GRNs). Decision trees have indeed several nice properties that make them well-suited for tackling this problem: they are able to detect multivariate interacting effects between variables, are non-parametric, have good scalability, and have very few parameters. In particular, we describe in detail the GENIE3 algorithm, a state-of-the-art method for GRN inference.

[1]  Pierre Geurts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017 .

[2]  Yvan Saeys,et al.  Statistical interpretation of machine learning-based feature importance scores for biomarker discovery , 2012, Bioinform..

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Erwan Scornet,et al.  Rejoinder on: A random forest guided tour , 2016 .

[5]  A. Brazma,et al.  Towards reconstruction of gene networks from expression data by supervised learning , 2003, Genome Biology.

[6]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[7]  Quanquan Gu,et al.  Identifying gene regulatory network rewiring using latent differential graphical models , 2016, Nucleic acids research.

[8]  Weixiong Zhang,et al.  A bi-dimensional regression tree approach to the modeling of gene expression regulation , 2006, Bioinform..

[9]  Mario L. Arrieta-Ortiz,et al.  An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network , 2015, Molecular systems biology.

[10]  Tom Michoel,et al.  Context-specific transcriptional regulatory network inference from global gene expression maps using double two-way t-tests , 2012, Bioinform..

[11]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[12]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[13]  Muriel Médard,et al.  Network deconvolution as a general method to distinguish direct dependencies in networks , 2013, Nature Biotechnology.

[14]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[15]  Gilles Louppe,et al.  Context-dependent feature analysis with random forests , 2016, UAI.

[16]  Kathleen Marchal,et al.  Module networks revisited: computational assessment and prioritization of model predictions , 2009, Bioinform..

[17]  Fabian J. Theis,et al.  Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data , 2015, Bioinform..

[18]  Frauke Degenhardt,et al.  Evaluation of variable selection methods for random forests and omics data sets , 2017, Briefings Bioinform..

[19]  Doheon Lee,et al.  Regression trees for regulatory element identification , 2004, Bioinform..

[20]  Christophe Ambroise,et al.  Inferring multiple graphical structures , 2009, Stat. Comput..

[21]  Pierre Geurts,et al.  Bridging physiological and evolutionary time-scales in a gene regulatory network. , 2013, The New phytologist.

[22]  Hector Zenil,et al.  Evaluating Network Inference Methods in Terms of Their Ability to Preserve the Topology and Complexity of Genetic Networks , 2015, Seminars in cell & developmental biology.

[23]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[24]  H. Ishwaran Variable importance in binary regression trees and forests , 2007, 0711.2434.

[25]  Yoav Freund,et al.  Predicting genetic regulatory response using classification , 2004, ISMB/ECCB.

[26]  Su-In Lee,et al.  Node-based learning of multiple Gaussian graphical models , 2013, J. Mach. Learn. Res..

[27]  Pierre Geurts,et al.  dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data , 2018, Scientific Reports.

[28]  Pierre Geurts,et al.  Mapping gene regulatory networks in Drosophila eye development by large-scale transcriptome perturbations and motif inference. , 2014, Cell reports.

[29]  Jesús S. Aguilar-Ruiz,et al.  Inferring gene regression networks with model trees , 2010, BMC Bioinformatics.

[30]  S. Aerts,et al.  Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state , 2015, Nature Communications.

[31]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[32]  Mark R. Segal,et al.  Identification of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests , 2009, PLoS Comput. Biol..

[33]  Insuk Lee,et al.  An integrated systems biology approach identifies positive cofactor 4 as a factor that increases reprogramming efficiency , 2016, Nucleic acids research.

[34]  D. Pe’er,et al.  Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification , 2006, Proceedings of the National Academy of Sciences.

[35]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[36]  Mark A. Ragan,et al.  Supervised, semi-supervised and unsupervised inference of gene regulatory networks , 2013, Briefings Bioinform..

[37]  Melissa J. Davis,et al.  Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets , 2012, Genome Medicine.

[38]  Xing-Ming Zhao,et al.  NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference , 2013, Bioinform..

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Pierre Geurts,et al.  Gene Regulatory Network Inference from Systems Genetics Data Using Tree-Based Methods , 2013 .

[41]  Ronald C. Taylor,et al.  Brain in situ hybridization maps as a source for reverse-engineering transcriptional regulatory networks: Alzheimer's disease insights. , 2016, Gene.

[42]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[43]  Pei Wang,et al.  Integrative random forest for gene regulatory network inference , 2015, Bioinform..

[44]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[45]  Jeanne M O Eloundou-Mbebi,et al.  Gene regulatory network inference using fused LASSO on multiple data sets , 2016, Scientific Reports.

[46]  Guido Sanguinetti,et al.  Combining tree-based and dynamical systems for the inference of gene regulatory networks , 2015 .

[47]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[48]  Yvan Saeys,et al.  An integrated network of Arabidopsis growth regulators and its use for gene prioritization , 2015, Scientific Reports.

[49]  Tong Zhou,et al.  Expression Profiling of Mitochondrial Voltage-Dependent Anion Channel-1 Associated Genes Predicts Recurrence-Free Survival in Human Carcinomas , 2014, PloS one.

[50]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[51]  Jason A. Corwin,et al.  An Arabidopsis Gene Regulatory Network for Secondary Cell Wall Synthesis , 2014, Nature.

[52]  Philippe Salembier,et al.  NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference , 2015, BMC Bioinformatics.

[53]  Francesca Petralia,et al.  New Method for Joint Network Analysis Reveals Common and Different Coexpression Patterns among Genes and Proteins in Breast Cancer , 2016, Journal of proteome research.

[54]  Timothy J. Donohue,et al.  An Integrated Approach to Reconstructing Genome-Scale Transcriptional Regulatory Networks , 2015, PLoS Comput. Biol..

[55]  Ilias Tagkopoulos,et al.  An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli , 2014, Molecular systems biology.

[56]  Pierre Geurts,et al.  Supervised learning with decision tree-based methods in computational and systems biology. , 2009, Molecular bioSystems.