Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers

We give an example of a neural net without hidden layers and with a sigmoid transfer function, together with a training set of binary vectors, for which the sum of the squared errors, regarded as a function of the weights, has a local minimum which is not a global minimum. The example consists of a set of 125 training instances, with four weights and a threshold to be learnt. We do not know if substantially smaller binary examples exist.