A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks

Typical methods for gradient descent in neural network learning involve calculation of derivatives based on a detailed knowledge of the network model. This requires extensive, time consuming calculations for each pattern presentation and high precision that makes it difficult to implement in VLSI. We present here a perturbation technique that measures, not calculates, the gradient. Since the technique uses the actual network as a measuring device, errors in modeling neuron activation and synaptic weights do not cause errors in gradient descent. The method is parallel in nature and easy to implement in VLSI. We describe the theory of such an algorithm, an analysis of its domain of applicability, some simulations using it and an outline of a hardware implementation.

[1]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  B. Hochet,et al.  Multivalued MOS memory for variable-synapse neural networks , 1989 .

[4]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[5]  Thomas Kailath,et al.  Model-free distributed learning , 1990, IEEE Trans. Neural Networks.

[6]  Marwan A. Jabri,et al.  Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks , 1991, Neural Computation.

[7]  Stuart Haber,et al.  A VLSI-efficient technique for generating multiple uncorrelated noise sources and its application to stochastic neural networks , 1991 .

[8]  Marwan A. Jabri,et al.  Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks , 1991, Neural Comput..

[9]  Joshua Alspector,et al.  Experimental Evaluation of Learning in a Neural Microsystem , 1991, NIPS.

[10]  Kurt W. Fleischer,et al.  Analog VLSI Implementation of Gradient Descent , 1992, NIPS.

[11]  Gert Cauwenberghs,et al.  A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization , 1992, NIPS.

[12]  Marwan A. Jabri,et al.  Summed Weight Neuron Perturbation: An O(N) Improvement Over Weight Perturbation , 1992, NIPS.

[13]  Santosh S. Venkatesh,et al.  Directed Drift: A New Linear Threshold Algorithm for Learning Binary Weights On-Line , 1993, J. Comput. Syst. Sci..