NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

[1]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[2]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[3]  Melissa J. Davis,et al.  Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets , 2012, Genome Medicine.

[4]  Ralf Herwig,et al.  Reverse Engineering of Gene Regulatory Networks: A Comparative Study , 2009, EURASIP J. Bioinform. Syst. Biol..

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[7]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[8]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[9]  Ralf Zimmer,et al.  Inferring gene regulatory networks by ANOVA , 2012, Bioinform..

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[12]  Pedro Larrañaga,et al.  Bioinformatics Advance Access published August 24, 2007 A review of feature selection techniques in bioinformatics , 2022 .

[13]  Kalyan Veeramachaneni,et al.  Knowledge mining with genetic programming methods for variable selection in flavor design , 2010, GECCO '10.

[14]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[15]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[16]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[17]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  N. Lytkin,et al.  A comprehensive assessment of methods for de-novo reverse-engineering of genome-scale regulatory networks. , 2011, Genomics.

[19]  Kathleen Marchal,et al.  Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks , 2009, BMC Systems Biology.

[20]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[21]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[22]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[23]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Frank Emmert-Streib,et al.  Bagging Statistical Network Inference from Large-Scale Gene Expression Data , 2012, PloS one.

[26]  Claudio Altafini,et al.  Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data , 2007, Bioinform..

[27]  Frank Emmert-Streib,et al.  Inferring the conservative causal core of gene regulatory networks , 2010, BMC Systems Biology.

[28]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[29]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[31]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[32]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[33]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[34]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[35]  Mark Kotanchek,et al.  Pareto-Front Exploitation in Symbolic Regression , 2005 .

[36]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .