A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization

A parallel stochastic algorithm is investigated for error-descent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the model-free distributed learning mechanism of Dembo and Kailath. A modified parameter update rule is proposed by which each individual parameter vector perturbation contributes a decrease in error. A substantially faster learning speed is hence allowed. Furthermore, the modified algorithm supports learning time-varying features in dynamical networks. We analyze the convergence and scaling properties of the algorithm, and present simulation results for dynamic trajectory learning in recurrent networks.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[3]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[4]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[5]  Thomas Kailath,et al.  Model-free distributed learning , 1990, IEEE Trans. Neural Networks.

[6]  Stuart Haber,et al.  A VLSI-efficient technique for generating multiple uncorrelated noise sources and its application to stochastic neural networks , 1991 .

[7]  Marwan A. Jabri,et al.  Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks , 1991, Neural Comput..

[8]  Kurt W. Fleischer,et al.  Analog VLSI Implementation of Gradient Descent , 1992, NIPS.

[9]  Marwan A. Jabri,et al.  Summed Weight Neuron Perturbation: An O(N) Improvement Over Weight Perturbation , 1992, NIPS.

[10]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[11]  Ron Meir,et al.  A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks , 1992, NIPS.

[12]  Jacob Barhen,et al.  Learning a trajectory using adjoint functions and teacher forcing , 1992, Neural Networks.