论文信息 - Early Stopping - But When?

Early Stopping - But When?

Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting (“early stopping”). The exact criterion used for validation-based early stopping, however, is usually chosen in an ad-hoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization, whichever is more important in the particular situation. An empirical investigation on multi-layer perceptrons shows that there exists a tradeoff between training time and generalization: From the given mix of 1296 training runs using different 12 problems and 24 different network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average).

Lutz Prechelt | L. Prechelt

[1] Scott E. Fahlman,et al. An empirical study of learning speed in back-propagation networks , 1988 .

[2] Hervé Bourlard,et al. Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[3] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[4] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[5] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[6] David E. Rumelhart,et al. Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[7] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[8] Pierre Baldi,et al. Temporal Evolution of Generalization during Learning in Linear Networks , 1991, Neural Computation.

[9] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[10] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[11] Geoffrey E. Hinton,et al. Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.