Extremely randomized trees

This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the Extra-Trees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Louis Wehenkel,et al.  Decision trees and transient stability of electric power systems , 1991, Autom..

[3]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[4]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[5]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[9]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Kamal A. Ali,et al.  On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles , 1995 .

[12]  L. Wehenkel On uncertainty measures used for decision tree induction , 1996 .

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[14]  Louis Wehenkel,et al.  Automatic Learning Techniques in Power Systems , 1997 .

[15]  L. Breiman Arcing Classifiers , 1998 .

[16]  Geoffrey I. Webb,et al.  Stochastic Attribute Selection Committees , 1998, Australian Joint Conference on Artificial Intelligence.

[17]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[19]  L. Torgo Inductive learning of tree-based regression models , 1999 .

[20]  Guohua Zhao A New Perspective on Classification , 2000 .

[21]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[22]  Pierre Geurts,et al.  Investigation and Reduction of Discretization Variance in Decision Tree Induction , 2000, ECML.

[23]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[24]  Robert Tibshirani,et al.  The Elements of Statistical Learning , 2001 .

[25]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[26]  Adele Cutler,et al.  PERT – Perfect Random Tree Ensembles , 2001 .

[27]  Chandrika Kamath,et al.  Approximate Splitting for Ensembles of Trees using Histograms , 2001, SDM.

[28]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[29]  Jian-xiong Dong,et al.  A Fast SVM Training Algorithm , 2003, Int. J. Pattern Recognit. Artif. Intell..

[30]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[31]  Raphaël Marée,et al.  A generic approach for image classification based on decision tree ensembles and local sub-windows , 2004 .

[32]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[33]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[34]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[35]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[36]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[37]  Wray L. Buntine,et al.  A further comparison of splitting rules for decision-tree induction , 2004, Machine Learning.

[38]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[39]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[40]  Geoffrey I. Webb MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[41]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[42]  Geoffrey I. Webb,et al.  Multistrategy ensemble learning: reducing error by combining ensemble learning techniques , 2004, IEEE Transactions on Knowledge and Data Engineering.

[43]  Wray L. Buntine,et al.  A Further Comparison of Splitting Rules for Decision-Tree Induction , 1992, Machine Learning.

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[46]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[47]  Pierre Geurts,et al.  Segment and Combine Approach for Biological Sequence Classification , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[48]  E. M. Kleinberg,et al.  Stochastic discrimination , 1990, Annals of Mathematics and Artificial Intelligence.

[49]  Pierre Geurts,et al.  Proteomic mass spectra classification using decision tree based ensemble methods , 2005, Bioinform..

[50]  Pierre Geurts,et al.  Segment and Combine Approach for Non-parametric Time-Series Classification , 2005, PKDD.

[51]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[52]  Thomas G. Dietterich,et al.  Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms , 2008 .