Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback-Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the nonasymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.