Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population. We further show that this phenomenon occurs for most popular estimates of prediction error, including data splitting, bootstrapping, and Mallow’s C p . Next, the standard confidence intervals for prediction error derived from cross-validation may have coverage far below the desired level. Because each data point is used for both training and testing, there are correlations among the measured accuracies for each fold, and so the usual estimate of variance is too small. We introduce a nested cross-validation scheme to estimate this variance more accurately, and we show empirically that this modification leads to intervals with approximately correct coverage in many examples where traditional cross-validation intervals fail.

[1]  Andrea Montanari,et al.  Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.

[2]  Waleed A. Yousef,et al.  Estimating the Standard Error of Cross-Validation-Based Estimators of Classification Rules Performance , 2019, Pattern Recognit. Lett..

[3]  Richard D Riley,et al.  Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small , 2020, Journal of clinical epidemiology.

[4]  Marco Carone,et al.  Nonparametric variable importance assessment using machine learning techniques , 2020, Biometrics.

[5]  Tuomas Sivula,et al.  Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison , 2020, 2008.10296.

[6]  Lester W. Mackey,et al.  Cross-validation Confidence Intervals for Test Error , 2020, NeurIPS.

[7]  Lucas Janson,et al.  Floodgate: inference for model-free variable importance , 2020, 2007.01283.

[8]  Morgane Austern,et al.  Asymptotics of Cross-Validation , 2020, 2001.11111.

[9]  Stefan Wager,et al.  Cross-Validation, Risk Estimation, and Model Selection: Comment on a Paper by Rosset and Tibshirani , 2020 .

[10]  Maya Petersen,et al.  Improved Small-Sample Estimation of Nonlinear Cross-Validated Prediction Metrics , 2019, Journal of the American Statistical Association.

[11]  Sifan Liu,et al.  Ridge Regression: Structure, Cross-Validation, and Sketching , 2019, ICLR.

[12]  C. Holmes,et al.  On the marginal likelihood and cross-validation , 2019, Biometrika.

[13]  S. Rosset,et al.  Cross-Validation for Correlated Data , 2019, Journal of the American Statistical Association.

[14]  Waleed A. Yousef,et al.  A Leisurely Look at Versions and Variants of the Cross Validation Estimator , 2019, ArXiv.

[15]  Galen Reeves,et al.  The All-or-Nothing Phenomenon in Sparse Linear Regression , 2019, COLT.

[16]  Saharon Rosset,et al.  From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation , 2017, Journal of the American Statistical Association.

[17]  Jing Lei,et al.  Cross-Validation With Confidence , 2017, Journal of the American Statistical Association.

[18]  Whitney K. Newey,et al.  Cross-fitting and fast remainder rates for semiparametric estimation , 2017, 1801.09138.

[19]  Alain Celisse,et al.  Stability revisited: new generalisation bounds for the Leave-one-Out , 2016, 1608.06412.

[20]  Maya Petersen,et al.  Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. , 2015, Electronic journal of statistics.

[21]  Sergei Vassilvitskii,et al.  Near-Optimal Bounds for Cross-Validation via Loss Stability , 2013, ICML.

[22]  Sergei Vassilvitskii,et al.  Cross-Validation and Mean-Square Stability , 2011, ICS.

[23]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[24]  Guan Yu,et al.  Variance stabilizing transformations of Poisson, binomial and negative binomial distributions , 2009 .

[25]  R. Tibshirani,et al.  A bias correction for the minimum error rate in cross-validation , 2009, 0908.2904.

[26]  Robert Tibshirani,et al.  A study of pre-validation , 2008, 0807.4105.

[27]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[28]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[29]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[30]  George Hripcsak,et al.  Analysis of Variance of Cross-Validation Estimators of the Generalization Error , 2005, J. Mach. Learn. Res..

[31]  S. Dudoit,et al.  Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[32]  B. Efron The Estimation of Prediction Error , 2004 .

[33]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[34]  Yoshua Bengio,et al.  Série Scientifique Scientific Series No Unbiased Estimator of the Variance of K-fold Cross-validation No Unbiased Estimator of the Variance of K-fold Cross-validation , 2022 .

[35]  R. Tibshirani,et al.  Pre-validation and inference in microarrays , 2002, Statistical applications in genetics and molecular biology.

[36]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[37]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[38]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[39]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[40]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[41]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[42]  Rosen D Von On moments of the inverted Wishart distribution , 1997 .

[43]  L. Breiman OUT-OF-BAG ESTIMATION , 1996 .

[44]  P. Zhang,et al.  Assessing prediction error in non-parametric regression , 1995 .

[45]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[46]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[47]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[48]  Torsten Söderström,et al.  Model-structure selection by cross-validation , 1986 .

[49]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[50]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[51]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[52]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[53]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[54]  H. Akaike A new look at the statistical model identification , 1974 .

[55]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[56]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[57]  T. Groves,et al.  A note on the expected value of an inverse matrix , 1969 .