论文信息 - Closed-form dual perturb and combine for tree-based models

Closed-form dual perturb and combine for tree-based models

This paper studies the aggregation of predictions made by tree-based models for several perturbed versions of the attribute vector of a test case. A closed-form approximation of this scheme combined with cross-validation to tune the level of perturbation is proposed. This yields soft-tree models in a parameter free way. and preserves their interpretability. Empirical evaluations, on classification and regression problems, show that accuracy and bias/variance tradeoff are improved significantly at the price of an acceptable computational overhead. The method is further compared and combined with tree bagging.

Pierre Geurts | Louis Wehenkel | P. Geurts | L. Wehenkel

[1] Chris Carter,et al. Assessing Credit Card Applications Using Machine Learning , 1987, IEEE Expert.

[2] J. Freidman,et al. Multivariate adaptive regression splines , 1991 .

[3] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[4] Michael I. Jordan. A statistical approach to decision tree modeling , 1994, COLT '94.

[5] Louis Wehenkel,et al. Automatic Learning Techniques in Power Systems , 1997 .

[6] L. Breiman. Arcing Classifiers , 1998 .

[7] Tin Kam Ho,et al. The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[9] L. Torgo. Inductive learning of tree-based regression models , 1999 .

[10] Hermann Ney,et al. Combined Classification of Handwritten Digits Using the 'Virtual Test Sample Method' , 2001, Multiple Classifier Systems.

[11] Pierre Geurts,et al. Dual perturb and combine algorithm , 2001, AISTATS.

[12] C. Ling,et al. Decision Tree with Better Ranking , 2003, ICML.

[13] Louis Wehenkel,et al. A complete fuzzy decision tree technique , 2003, Fuzzy Sets Syst..

[14] Leo Breiman,et al. Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[15] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[16] Jerome H. Friedman,et al. On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[17] Thomas G. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[18] Yoshua Bengio,et al. Inference for the Generalization Error , 1999, Machine Learning.

[19] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.