Experiments on Learning by Back Propagation.

Abstract : Rumelhart, Hinton and Williams (Rumelhart 86) describe a learning procedure for layered networks of deterministic, neuron-like units. This paper describes further research on the learning procedure. We start by describing the units, the way they are connected, the learning procedure, and the extension to iterative nets. We then give an example in which a network learns a set of filters that enable it to discriminate formant-like patterns in the presence of noise. The speed of learning is strongly dependent on the shape of the surface formed by the error measure in weight space . We give examples of the shape of the error surface for a typical task and illustrate how an acceleration method speeds up descent in weight space. The main drawback of the learning procedure is the way it scales as the size of the task and the network increases. We give some preliminary results on scaling and show how the magnitude of the optimal weight changes depends on the fan-in of the units. Additional results illustrate the effects on learning speed of the amount of interaction between the weights. A variation of the learning procedure that back-propagates desired state information rather than error gradients is developed and compared with the standard procedure. Finally, we discuss the relationship between our iterative networks and the analog networks described by Hopefield and Tank (Hopfield 85). The learning procedure can discover appropriate weights in their kind of network, as well as determine an optimal schedule for varying the nonlinearity of the units during a search.