A scaled conjugate gradient algorithm for fast supervised learning

A supervised learning algorithm (Scaled Conjugate Gradient, SCG) is introduced. The performance of SCG is benchmarked against that of the standard back propagation algorithm (BP) (Rumelhart, Hinton, & Williams, 1986), the conjugate gradient algorithm with line search (CGL) (Johansson, Dowla, & Goodman, 1990) and the one-step Broyden-Fletcher-Goldfarb-Shanno memoriless quasi-Newton algorithm (BFGS) (Battiti, 1990). SCG is fully-automated, includes no critical user-dependent parameters, and avoids a time consuming line search, which CGL and BFGS use in each iteration in order to determine an appropriate step size. Experiments show that SCG is considerably faster than BP, CGL, and BFGS.

[1]  Philip E. Gill,et al.  Practical optimization , 1981 .

[2]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[3]  Martin Fodslette Møller,et al.  Learning by Conjugate Gradients , 1990, IMYCS.

[4]  Gerald Tesauro,et al.  Scaling Relationships in Back-Propagation Learning: Dependence on Training Set Size , 1987, Complex Syst..

[5]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[6]  M. J. D. Powell,et al.  Restart procedures for the conjugate gradient method , 1977, Math. Program..

[7]  Magnus R. Hestenes,et al.  Conjugate Direction Methods in Optimization , 1980 .

[8]  T. Yoshida A learning algorithm for multilayered neural networks: a Newton method using automatic differentiation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[9]  J. S. Judd,et al.  Complexity of Connectionist Learning with Various Node Functions , 1987 .

[10]  B. Boser,et al.  Backpropagation Learning for Multi-layer Feed-forward Neural Networks Using the Conjugate Gradient Method. Ieee Transactions on Neural Networks, 1991. [31] M. F. Mller. a Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Technical Report Pb-339 , 2007 .

[11]  J. T. Schwartz,et al.  The new connectionism: developing relationships between neuroscience and artificical intelligence , 1989 .

[12]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[13]  Ian Pratt,et al.  The artificial intelligence debate: false starts, real foundations , 1990 .

[14]  Farid U. Dowla,et al.  Backpropagation Learning for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method , 1991, Int. J. Neural Syst..

[15]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[16]  Heinz Mühlenbein,et al.  Limitations of multi-layer perceptron networks-steps towards genetic neural networks , 1990, Parallel Comput..

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[18]  M. Moller,et al.  Supervised learning on large redundant training sets , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[19]  R. Fletcher Practical Methods of Optimization , 1988 .

[20]  Roberto Battiti,et al.  BFGS Optimization for Faster and Automated Supervised Learning , 1990 .