论文信息 - Random Forests for Regression as a Weighted Sum of ${k}$ -Potential Nearest Neighbors

Random Forests for Regression as a Weighted Sum of ${k}$ -Potential Nearest Neighbors

In this paper, we tackle the problem of random forests for regression expressed as weighted sums of datapoints. We study the theoretical behavior of <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula>-potential nearest neighbors (<inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula>-PNNs) under bagging and obtain an upper bound on the weights of a datapoint for random forests with any type of splitting criterion, provided that we use unpruned trees that stop growing only when there are <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> or less datapoints at their leaves. Moreover, we use the previous bound together with the concept of b-terms (i.e., bootstrap terms) introduced in this paper, to derive the explicit expression of weights for datapoints in a random (<inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula>-PNNs) selection setting, a datapoint selection strategy that we also introduce and to build a framework to derive other bagged estimators using a similar procedure. Finally, we derive from our framework the explicit expression of weights of a regression estimate equivalent to a random forest regression estimate with the random splitting criterion and demonstrate its equivalence both theoretically and practically.

Pedro Larrañaga | Pablo Fernández-González | Concepción Bielza

[1] Arnaud Guyader,et al. On the Rate of Convergence of the Bagged Nearest Neighbor Estimate , 2010, J. Mach. Learn. Res..

[2] Cesare Furlanello,et al. Exact Bagging with k-Nearest Neighbour Classifiers , 2004, Multiple Classifier Systems.

[3] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[4] Noureddine El Karoui,et al. Can we trust the bootstrap in high-dimension? , 2016, 1608.00696.

[5] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[6] Adele Cutler,et al. PERT – Perfect Random Tree Ensembles , 2001 .

[7] O. Barndorfi-nielsen,et al. On the distribution of the number of admissible points in a vector , 1966 .

[8] Yi Lin,et al. Random Forests and Adaptive Nearest Neighbors , 2006 .

[9] R. Samworth. Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.

[10] Luc Devroye,et al. On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification , 2010, J. Multivar. Anal..

[11] Erwan Scornet,et al. A random forest guided tour , 2015, TEST.

[12] Hsien-Kuei Hwang,et al. Maxima in hypercubes , 2005, Random Struct. Algorithms.

[13] J. L. Hodges,et al. Discriminatory Analysis - Nonparametric Discrimination: Small Sample Performance , 1952 .

[14] J. L. Hodges,et al. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[15] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[16] Brian M. Steele,et al. Exact bootstrap k-nearest neighbor learners , 2009, Machine Learning.