### Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization

The problem of learning using connectionist networks, in which network connection strengths are modified systematically so that the response of the network increasingly approximates the desired response can be structured as an optimization problem. The widely used back propagation method of connectionist learning [19, 21, 18] is set in the context of nonlinear optimization. In this framework, the issues of stability, convergence and parallelism are considered. As a form of gradient descent with fixed step size, back propagation is known to be unstable, which is illustrated using Rosenbrock's function. This is contrasted with stable methods which involve a line search in the gradient direction. The convergence criterion for connectionist problems involving binary functions is discussed relative to the behavior of gradient descent in the vicinity of local minima. A minimax criterion is compared with the least squares criterion. The contribution of the momentum term [19, 18] to more rapid convergence is interpreted relative to the geometry of the weight space. It is shown that in plateau regions of relatively constant gradient, the momentum term acts to increase the step size by a factor of 1/1-μ, where μ is the momentum term. In valley regions with steep sides, the momentum constant acts to focus the search direction toward the local minimum by averaging oscillations in the gradient. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-88-62. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/597 LEARNING ALGORITHMS FOR CONNECTIONIST NETWORKS: APPLIED GRADIENT METHODS OF NONLINEAR OPTIMIZATION

[1]  Joseph Henry Wegstein,et al.  Accelerating convergence of iterative processes , 1958, CACM.

[2]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[3]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[4]  Y. Bard On a numerical instability of Davidon-like methods , 1968 .

[5]  J. D. Pearson ON VARIABLE METRIC METHODS OF MINIMIZATION , 1968 .

[6]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Computer/law journal.

[8]  S. Vajda,et al.  Numerical Methods for Non-Linear Optimization , 1973 .

[9]  Kumpati S. Narendra,et al.  Adaptation and learning in automatic systems , 1974 .

[10]  D. J. Bell,et al.  Numerical Methods for Unconstrained Optimization , 1979 .

[11]  T. M. Williams,et al.  Practical Methods of Optimization. Vol. 1: Unconstrained Optimization , 1980 .

[12]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[16]  Lokendra Shastri,et al.  Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[17]  R. Fletcher Practical Methods of Optimization , 1988 .

[18]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.