Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population. We further show that this phenomenon occurs for most popular estimates of prediction error, including data splitting, bootstrapping, and Mallow’s Cp. Next, the standard confidence intervals for prediction error derived from cross-validation may have coverage far below the desired level. Because each data point is used for both training and testing, there are correlations among the measured accuracies for each fold, and so the usual estimate of variance is too small. We introduce a nested cross-validation scheme to estimate this variance more accurately, and show empirically that this modification leads to intervals with approximately correct coverage in many examples where traditional cross-validation intervals fail. Lastly, our analysis also shows that when producing confidence intervals for prediction accuracy with simple data splitting, one should not re-fit the model on the combined data, since this invalidates the confidence intervals.

[1]  T. Groves,et al.  A note on the expected value of an inverse matrix , 1969 .

[2]  H. Akaike A new look at the statistical model identification , 1974 .

[3]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[4]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[5]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[8]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[9]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[10]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[11]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[12]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[13]  P. Zhang,et al.  Assessing prediction error in non-parametric regression , 1995 .

[14]  L. Breiman OUT-OF-BAG ESTIMATION , 1996 .

[15]  Rosen D Von On moments of the inverted Wishart distribution , 1997 .

[16]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[17]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[18]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[19]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[20]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[21]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[22]  R. Tibshirani,et al.  Pre-validation and inference in microarrays , 2002, Statistical applications in genetics and molecular biology.

[23]  Yoshua Bengio,et al.  Série Scientifique Scientific Series No Unbiased Estimator of the Variance of K-fold Cross-validation No Unbiased Estimator of the Variance of K-fold Cross-validation , 2022 .

[24]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[25]  B. Efron The Estimation of Prediction Error , 2004 .

[26]  George Hripcsak,et al.  Analysis of Variance of Cross-Validation Estimators of the Generalization Error , 2005, J. Mach. Learn. Res..

[27]  S. Dudoit,et al.  Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[28]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[29]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[30]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[31]  Robert Tibshirani,et al.  A study of pre-validation , 2008, 0807.4105.

[32]  R. Tibshirani,et al.  A bias correction for the minimum error rate in cross-validation , 2009, 0908.2904.

[33]  Guan Yu,et al.  Variance stabilizing transformations of Poisson, binomial and negative binomial distributions , 2009 .

[34]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[35]  Sergei Vassilvitskii,et al.  Cross-Validation and Mean-Square Stability , 2011, ICS.

[36]  Sergei Vassilvitskii,et al.  Near-Optimal Bounds for Cross-Validation via Loss Stability , 2013, ICML.

[37]  Maya Petersen,et al.  Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. , 2015, Electronic journal of statistics.

[38]  Alain Celisse,et al.  Stability revisited: new generalisation bounds for the Leave-one-Out , 2016, 1608.06412.

[39]  Whitney K. Newey,et al.  Cross-fitting and fast remainder rates for semiparametric estimation , 2017, 1801.09138.

[40]  S. Rosset,et al.  Cross-Validation for Correlated Data , 2019, Journal of the American Statistical Association.

[41]  C. Holmes,et al.  On the marginal likelihood and cross-validation , 2019, Biometrika.

[42]  Saharon Rosset,et al.  From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation , 2017, Journal of the American Statistical Association.

[43]  Jing Lei,et al.  Cross-Validation With Confidence , 2017, Journal of the American Statistical Association.

[44]  Waleed A. Yousef,et al.  A Leisurely Look at Versions and Variants of the Cross Validation Estimator , 2019, ArXiv.

[45]  Marco Carone,et al.  Nonparametric variable importance assessment using machine learning techniques , 2020, Biometrics.

[46]  Stefan Wager,et al.  Cross-Validation, Risk Estimation, and Model Selection: Comment on a Paper by Rosset and Tibshirani , 2020 .

[47]  Richard D Riley,et al.  Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small , 2020, Journal of clinical epidemiology.

[48]  Floodgate: inference for model-free variable importance , 2020, 2007.01283.

[49]  Tuomas Sivula,et al.  Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison , 2020, 2008.10296.

[50]  Maya Petersen,et al.  Improved Small-Sample Estimation of Nonlinear Cross-Validated Prediction Metrics , 2019, Journal of the American Statistical Association.

[51]  Morgane Austern,et al.  Asymptotics of Cross-Validation , 2020, 2001.11111.

[52]  Lester W. Mackey,et al.  Cross-validation Confidence Intervals for Test Error , 2020, NeurIPS.

[53]  Waleed A. Yousef,et al.  Estimating the Standard Error of Cross-Validation-Based Estimators of Classification Rules Performance , 2019, Pattern Recognit. Lett..

[54]  Andrea Montanari,et al.  Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.