Initializations, back-propagation and generalization of feed-forward classifiers

The backpropagation method is very sensitive to initial weights. A commonly used heuristic is to train a large number of networks using different initial weights for training. The network with the lowest mean squared error is selected from those networks as the optimal network. It is shown that this simple heuristic, meant to improve network training, sometimes favors neural network classifiers with poor generalization capabilities. A measure is proposed to quantify this phenomenon, it is studied as a function of the training time.<<ETX>>