The utility of the back-propagation method in establishing suitable weights in a distributed adaptive network has been demonstrated repeatedly. Unfortunately, in many applications, the number of iterations required before convergence can be large. Modifications to the back-propagation algorithm described by Rumelhart et al. (1986) can greatly accelerate convergence. The modifications consist of three changes:1) instead of updating the network weights after each pattern is presented to the network, the network is updated only after the entire repertoire of patterns to be learned has been presented to the network, at which time the algebraic sums of all the weight changes are applied:2) instead of keeping η, the “learning rate” (i.e., the multiplier on the step size) constant, it is varied dynamically so that the algorithm utilizes a near-optimum η, as determined by the local optimization topography; and3) the momentum factor α is set to zero when, as signified by a failure of a step to reduce the total error, the information inherent in prior steps is more likely to be misleading than beneficial. Only after the network takes a useful step, i.e., one that reduces the total error, does α again assume a non-zero value. Considering the selection of weights in neural nets as a problem in classical nonlinear optimization theory, the rationale for algorithms seeking only those weights that produce the globally minimum error is reviewed and rejected.
On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method
Learning in a marine snail.
Calcium-mediated reduction of ionic currents: a biophysical memory trace.
David G. Luenberger,et al.
Linear and nonlinear programming
Geoffrey E. Hinton,et al.
Learning internal representations by error propagation
Numerical linear algebra aspects of globally convergent homotopy methods
J. B. Rosen,et al.
Methods for global concave minimization: A bibliographic survey
Alistair I. Mees,et al.
Convergence of an annealing algorithm
James L. McClelland,et al.
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
Fernando J. Pineda,et al.
Generalization of Back propagation to Recurrent and Higher Order Neural Networks
George M. Whitson.
An introduction to the parallel distributed processing model of cognition and some examples of how it is changing the teaching of artificial intelligence
George M. Whitson,et al.
A testbed for sensory PDP models