Greedy function approximation: A gradient boosting machine.

Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such TreeBoost models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

[1]  H. Warner,et al.  A mathematical approach to medical diagnosis. Application to congenital heart disease. , 1961, JAMA.

[2]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[3]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[5]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[6]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[9]  David L. Donoho,et al.  Nonlinear Wavelet Methods for Recovery of Signals, Densities, and Spectra from Indirect and Noisy Da , 1993 .

[10]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[11]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[12]  L. Breiman Pasting Bites Together For Prediction In Large Data Sets And On-Line , 1996 .

[13]  Richard A. Becker,et al.  The Visual Design and Control of Trellis Display , 1996 .

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[15]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  Trevor Hastie,et al.  Additive Logistic Regression : a Statistical , 1998 .

[18]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[19]  David P. Helmbold,et al.  A geometric approach to leveraging weak learners , 1999, Theor. Comput. Sci..

[20]  Martin D. Buhmann,et al.  Radial Basis Functions , 2021, Encyclopedia of Mathematical Geosciences.

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Andrew D. Back,et al.  Radial Basis Functions , 2001 .

[23]  Jerome H. Friedman,et al.  Cr‐pyrope garnets in the lithospheric mantle 2. Compositional populations and their distribution in time and space , 2002 .