Automatic early stopping using cross validation: quantifying the criteria

Cross validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting ('early stopping'). The exact criterion used for cross validation based early stopping, however, is chosen in an ad-hoc fashion by most researchers or training is stopped interactively. To aid a more well-founded selection of the stopping criterion, 14 different automatic stopping criteria from three classes were evaluated empirically for their efficiency and effectiveness in 12 different classification and approximation tasks using multi-layer perceptrons with RPROP training. The experiments show that, on average, slower stopping criteria allow for small improvements in generalization (in the order of 4%), but cost about a factor of 4 longer in training time.

[1]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[3]  Pierre Baldi,et al.  Temporal Evolution of Generalization during Learning in Linear Networks , 1991, Neural Computation.

[4]  John E. Moody,et al.  Fast Pruning Using Principal Components , 1993, NIPS.

[5]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[6]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[7]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[8]  J. Stephen Judd,et al.  Optimal stopping and effective machine complexity in learning , 1993, Proceedings of 1995 IEEE International Symposium on Information Theory.

[9]  David S. Touretzky,et al.  Advances in neural information processing systems 2 , 1989 .

[10]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[11]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[12]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[13]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[14]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[15]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[16]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[17]  E. Fiesler,et al.  Comparative Bibliography of Ontogenic Neural Networks , 1994 .

[18]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[19]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .