Optimizing model-agnostic Random Subspace ensembles

This paper presents a model-agnostic ensemble approach for supervised learning. The proposed approach alternates between (1) learning an ensemble of models using a parametric version of the Random Subspace approach, in which feature subsets are sampled according to Bernoulli distributions, and (2) identifying the parameters of the Bernoulli distributions that minimize the generalization error of the ensemble model. Parameter optimization is rendered tractable by using an importance sampling approach able to estimate the expected model output for any given parameter set, without the need to learn new models. While the degree of randomization is controlled by a hyper-parameter in standard Random Subspace, it has the advantage to be automatically tuned in our parametric version. Furthermore, model-agnostic feature importance scores can be easily derived from the trained ensemble model. We show the good performance of the proposed approach, both in terms of prediction and feature ranking, on simulated and real-world datasets. We also show that our approach can be successfully used for the reconstruction of gene regulatory networks.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Hans Knutsson,et al.  Reinforcement Learning Trees , 1996 .

[3]  Saso Dzeroski,et al.  Combining Bagging and Random Subspaces to Create Better Ensembles , 2007, IDA.

[4]  Concha Bielza,et al.  A review of estimation of distribution algorithms in bioinformatics , 2008, BioData Mining.

[5]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[6]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[7]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[8]  David Barber,et al.  Optimization by Variational Bounding , 2013, ESANN.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[11]  Yvan Saeys,et al.  Feature Ranking Using an EDA-based Wrapper Approach , 2006, Towards a New Evolutionary Computation.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Leonidas J. Guibas,et al.  Optimally combining sampling techniques for Monte Carlo rendering , 1995, SIGGRAPH.

[14]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[17]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[18]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[19]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[21]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[22]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[23]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.