Generalization through Minimal Networks with Application to Forecasting

Abstract : Inspired by the information theoretic idea of minimum description length, we add a term to the usual back-propagation cost function that penalizes network complexity. From a Bayesian perspective, the complexity term can be usefully interpreted as an assumption about prior distribution of the weights. This method, called weight-elimination, is contrasted to ridge regression and to cross-validation. We apply weight-elimination to time series prediction. On the sunspot series, the network outperforms traditional statistical approaches and shows the same predictive power as multivariate adaptive regression splines.