论文信息 - Cross-Validation and the Bootstrap : Estimating the Error Rate ofa Prediction

Cross-Validation and the Bootstrap : Estimating the Error Rate ofa Prediction

A training set of data has been used to construct a rule for predicting future responses. What is the error rate of this rule? The traditional answer to this question is given by cross-validation. The cross-validation estimate of prediction error is nearly unbiased, but can be highly variable. This article discusses bootstrap estimates of prediction error, which can be thought of as smoothed versions of cross-validation. A particular bootstrap method, the 632+ rule, is shown to substantially outperform cross-validation in a catalog of 24 simulation experiments. Besides providing point estimates, we also consider estimating the variability of an error rate estimate. All of the results here are nonparametric, and apply to any possible prediction rule: however we only study classiication problems with 0-1 loss in detail. Our simulations include \smooth" prediction rules like Fisher's Linear Discriminant Function, and unsmooth ones like Nearest Neighbors.

R. Tibshirani | B. Efron

[1] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[2] Seymour Geisser,et al. The Predictive Sample Reuse Method with Applications , 1975 .

[3] B. Efron. Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[4] Anil K. Jain,et al. Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] S. M. Perlmutter,et al. Training sequence size and vector quantizer performance , 1991, [1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers.

[6] G. McLachlan. Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[7] L. Breiman,et al. Submodel selection and evaluation in regression. The X-random case , 1992 .

[8] Jerome H. Friedman,et al. Flexible Metric Nearest Neighbor Classification , 1994 .