Bias vs Variance Decomposition for Regression and Classification

In this chapter, the important concepts of bias and variance are introduced. After an intuitive introduction to the bias/variance tradeoff, we discuss the bias/variance decompositions of the mean square error (in the context of regression problems) and of the mean misclassification error (in the context of classification problems). Then, we carry out a small empirical study providing some insight about how the parameters of a learning algorithm influence bias and variance.

[1]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[2]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[3]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[4]  Geoffrey I. Webb MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[5]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[7]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  S. T. Buckland,et al.  An Introduction to the Bootstrap , 1994 .

[10]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[11]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[12]  David H. Wolpert,et al.  On Bias Plus Variance , 1997, Neural Computation.

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Tom Heskes,et al.  Bias/Variance Decompositions for Likelihood-Based Estimators , 1998, Neural Computation.

[15]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[16]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.