Simulated annealing and weight decay in adaptive learning: the SARPROP algorithm

A problem with gradient descent algorithms is that they can converge to poorly performing local minima. Global optimization algorithms address this problem, but at the cost of greatly increased training times. This work examines combining gradient descent with the global optimization technique of simulated annealing (SA). Simulated annealing in the form of noise and weight decay is added to resiliant backpropagation (RPROP), a powerful gradient descent algorithm for training feedforward neural networks. The resulting algorithm, SARPROP, is shown through various simulations not only to be able to escape local minima, but is also able to maintain, and often improve the training times of the RPROP algorithm. In addition, SARPROP may be used with a restart training phase which allows a more thorough search of the error surface and provides an automatic annealing schedule.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[3]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[4]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[5]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[6]  L. Darrell Whitley,et al.  Genetic algorithms and neural networks: optimizing connections and connectivity , 1990, Parallel Comput..

[7]  Robert M. Burton,et al.  Event-dependent control of noise enhances learning in neural networks , 1992, Neural Networks.

[8]  M. A. Styblinski,et al.  Experiments in nonconvex optimization: Stochastic approximation with function smoothing and simulated annealing , 1990, Neural Networks.

[9]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[10]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[11]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[12]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[13]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[14]  Roberto Brunelli,et al.  Training neural nets through stochastic minimization , 1994, Neural Networks.

[15]  H. Szu,et al.  Nonconvex optimization by fast simulated annealing , 1987, Proceedings of the IEEE.

[16]  J. Leo van Hemmen,et al.  Accelerating backpropagation through dynamic self-adaptation , 1996, Neural Networks.

[17]  Martin A. Riedmiller,et al.  Rprop - Description and Implementation Details , 1994 .

[18]  Gary J. Koehler,et al.  Deterministic global optimal FNN training algorithms , 1994, Neural Networks.

[19]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[20]  Norio Baba,et al.  A new approach for finding the global minimum of error function of neural networks , 1989, Neural Networks.

[21]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[22]  Yoshio Mogami,et al.  A hybrid algorithm for finding the global minimum of error function of neural networks and its applications , 1994, Neural Networks.

[23]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.