Gene Regulatory Network Inference from Systems Genetics Data Using Tree-Based Methods

One of the pressing open problems of computational systems biology is the elucidation of the topology of gene regulatory networks (GRNs). In an attempt to solve this problem, the idea of systems genetics is to exploit the natural variations that exist between the DNA sequences of related individuals and that can represent the randomized and multifactorial perturbations necessary to recover GRNs. In this chapter, we present new methods, called GENIE3-SG-joint and GENIE3-SG-sep, for the inference of GRNs from systems genetics data. Experiments on the artificial data of the StatSeq benchmark and of the DREAM5 Systems Genetics challenge show that exploiting jointly expression and genetic data is very helpful for recovering GRNs, and one of our methods outperforms by a large extent the official best performing method of the DREAM5 challenge.

[1]  Thomas Schiex,et al.  Gene Regulatory Network Reconstruction Using Bayesian Networks, the Dantzig Selector, the Lasso and Their Meta-Analysis , 2011, PloS one.

[2]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  David Kulp,et al.  Causal Inference of Regulator-Target Pairs by Gene Mapping of Expression Phenotypes , 2005, Systems Biology and Regulatory Genomics.

[6]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[7]  Andrea Pinna,et al.  Bioinformatics Applications Note Systems Biology Simulating Systems Genetics Data with Sysgensim , 2022 .

[8]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[9]  Yan Cui,et al.  Inferring gene transcriptional modulatory relations: a genetical genomics approach. , 2005, Human molecular genetics.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Pedro Larrañaga,et al.  Bioinformatics Advance Access published August 24, 2007 A review of feature selection techniques in bioinformatics , 2022 .

[12]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[13]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[14]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[15]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[16]  K. Schughart,et al.  Data-driven assessment of eQTL mapping methods , 2010, BMC Genomics.

[17]  Keith Shockley,et al.  Structural Model Analysis of Multiple Quantitative Traits , 2006, PLoS genetics.

[18]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[21]  A. Califano,et al.  Dialogue on Reverse‐Engineering Assessment and Methods , 2007, Annals of the New York Academy of Sciences.

[22]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[23]  B. Yandell,et al.  Inferring Causal Phenotype Networks From Segregating Populations , 2008, Genetics.

[24]  A. G. de la Fuente,et al.  Gene Network Inference via Structural Equation Modeling in Genetical Genomics Experiments , 2008, Genetics.

[25]  N. Bing,et al.  Genetical Genomics Analysis of a Yeast Segregant Population for Transcription Network Inference , 2005, Genetics.

[26]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[27]  Ritsert C. Jansen,et al.  Studying complex biological systems using multifactorial perturbation , 2003, Nature Reviews Genetics.

[28]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[30]  Steve Horvath,et al.  Using genetic markers to orient the edges in quantitative trait networks: The NEO software , 2008, BMC Systems Biology.

[31]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[32]  Jun Zhu,et al.  Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations , 2007, PLoS Comput. Biol..

[33]  John D. Storey,et al.  Harnessing naturally randomized transcription to infer regulatory relationships among genes , 2007, Genome Biology.

[34]  J. Nap,et al.  Genetical genomics: the added value from segregation. , 2001, Trends in genetics : TIG.