Understanding variable importances in forests of randomized trees

Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input variables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees.

[1]  L. Breiman CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS , 2004 .

[2]  H. Ishwaran Variable importance in binary regression trees and forests , 2007, 0711.2434.

[3]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[4]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[5]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[6]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[7]  Allan P. White,et al.  Technical Note: Bias in Information-Based Measures in Decision Tree Induction , 1994, Machine Learning.

[8]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[9]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Guohua Zhao A New Perspective on Classification , 2000 .

[15]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Paola Zuccolotto,et al.  Variable Selection Using Random Forests , 2006 .

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.