Smooth e-Intensive Regression by Loss Symmetrization

We describe new loss functions for regression problems along with an accompanying algorithmic framework which utilizes these functions. These loss functions are derived by symmetrization of margin-based losses commonly used in boosting algorithms, namely, the logistic loss and the exponential loss. The resulting symmetric logistic loss can be viewed as a smooth approximation to the e-insensitive hinge loss used in support vector regression. We describe and analyze two parametric families of batch learning algorithms for minimizing these symmetric losses. The first family employs an iterative log-additive update which can be viewed as a regression counterpart to recent boosting algorithms. The second family utilizes an iterative additive update step. We also describe and analyze online gradient descent (GD) and exponentiated gradient (EG) algorithms for the symmetric logistic loss. A byproduct of our work is a new simple form of regularization for boosting-based classification and regression algorithms. Our regression framework also has implications on classification algorithms, namely, a new additive update boosting algorithm for classification. We demonstrate the merits of our algorithms in a series of experiments.

[1]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[2]  A. G. Fisher,et al.  Generalized body composition prediction equations for men using simple measurement techniques , 1985 .

[3]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[4]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Nicolò Cesa-Bianchi,et al.  Analysis of Two Gradient-Based Algorithms for On-Line Regression , 1999 .

[7]  Robert E. Schapire,et al.  Drifting Games , 1999, Annual Conference Computational Learning Theory.

[8]  Manfred K. Warmuth,et al.  Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.

[9]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  David P. Helmbold,et al.  Leveraging for Regression , 2000, COLT.

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  John D. Lafferty,et al.  Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[15]  Yoav Freund,et al.  Drifting Games and Brownian Motion , 2002, J. Comput. Syst. Sci..

[16]  Robert E. Schapire,et al.  Incorporating Prior Knowledge into Boosting , 2002, ICML.

[17]  Jinbo Bi,et al.  A geometric approach to support vector regression , 2003, Neurocomputing.

[18]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[19]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.