Gradient calculations for dynamic recurrent neural networks: a survey

Surveys learning algorithms for recurrent neural networks with hidden units and puts the various techniques into a common framework. The authors discuss fixed point learning algorithms, namely recurrent backpropagation and deterministic Boltzmann machines, and nonfixed point algorithms, namely backpropagation through time, Elman's history cutoff, and Jordan's output feedback architecture. Forward propagation, an on-line technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the unified presentation leads to generalizations of various sorts. The author discusses advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones continues with some "tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. The author presents some simulations, and at the end, addresses issues of computational complexity and learning speed.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  S. Dreyfus Dynamic Programming and the Calculus of Variations , 1960 .

[3]  A. E. Bryson,et al.  A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .

[4]  Douglass J. Wilde,et al.  Foundations of Optimization. , 1967 .

[5]  D. Jacobson New second-order and first-order algorithms for determining optimal control: A differential dynamic programming approach , 1968 .

[6]  R. Mehra On the identification of variances and adaptive Kalman filtering , 1970 .

[7]  Arthur Gelb,et al.  Applied Optimal Estimation , 1974 .

[8]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[9]  B. Widrow,et al.  Stationary and nonstationary learning characteristics of the LMS adaptive filter , 1976, Proceedings of the IEEE.

[10]  D Marr,et al.  Cooperative computation of stereo disparity. , 1976, Science.

[11]  A.H. Haddad,et al.  Applied optimal estimation , 1976, Proceedings of the IEEE.

[12]  Geoffrey E. Hinton Using Relaxation to find a Puppet , 1976, AISB.

[13]  Geoffrey E. Hinton Relaxation and its role in vision , 1977 .

[14]  Mary W. Cooper,et al.  Dynamic Programming and the Calculus of Variations , 1981 .

[15]  Azriel Rosenfeld,et al.  Cooperating Processes for Low-Level Vision: A Survey , 1981, Artif. Intell..

[16]  B. Anderson,et al.  Optimal Filtering , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[18]  Stephen Grossberg,et al.  Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  Scott Kirkpatrick,et al.  Optimization by Simmulated Annealing , 1983, Sci..

[20]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[21]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[22]  P. Strevens Iii , 1985 .

[23]  Geoffrey E. Hinton,et al.  Shape Recognition and Illusory Conjunctions , 1985, IJCAI.

[24]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[25]  Jordan B. Pollack,et al.  Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation , 1988, Cogn. Sci..

[26]  N. J. Cohen,et al.  Higher-Order Boltzmann Machines , 1986 .

[27]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[28]  Geoffrey E. Hinton,et al.  The appeal of parallel distributed processing , 1986 .

[29]  Richard Szeliski,et al.  Cooperative algorithms for solving random-dot stereograms , 1986 .

[30]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[31]  Geoffrey E. Hinton,et al.  Learning symmetry groups with hidden units: beyond the perceptron , 1986 .

[32]  J. M. Sanz-Serna,et al.  On simple moving grid methods for one-dimensional evolutionary partial differential equations , 1988 .

[33]  John J. Hopfield,et al.  CONCENTRATION INFORMATION IN TIME: ANALOG NEURAL NETWORKS WITH APPLICATIONS TO SPEECH RECOGNITION PROBLEMS. , 1987 .

[34]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[35]  T. Sejnowski Higher‐order Boltzmann machines , 1987 .

[36]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[37]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[38]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[39]  Maureen Caudill,et al.  IEEE First International Conference on Neural Networks : Sheraton Harbor Island East, San Diego, California, June 21-24, 1987 , 1987 .

[40]  C. Lee Giles,et al.  Nonlinear dynamics of artificial neural systems , 1987 .

[41]  James P. Crutchfield,et al.  Equations of Motion from a Data Series , 1987, Complex Syst..

[42]  Amir F. Atiya Learning on a General Network , 1987, NIPS.

[43]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[44]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[45]  W. Freeman,et al.  How brains make chaos in order to make sense of the world , 1987, Behavioral and Brain Sciences.

[46]  Steven J. Nowlan,et al.  Gain Variation in Recurrent Error Propagation Networks , 1988, Complex Syst..

[47]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[48]  Raymond L. Watrous,et al.  Complete gradient optimization of a recurrent network applied to /b/,/d/,/g/ discrimination , 1988 .

[49]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[50]  J. Shynk,et al.  The LMS algorithm with momentum updating , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[51]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[52]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[53]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[54]  Mitsuo Kawato,et al.  Feedback error learning of movement by multi-layer neural network , 1988, Neural Networks.

[55]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[56]  Lokendra Shastri,et al.  Speech recognition using connectionist networks , 1988 .

[57]  Eytan Domany,et al.  Learning by Choice of Internal Representations , 1988, Complex Syst..

[58]  M. Gherrity,et al.  A learning algorithm for analog, fully recurrent neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[59]  Geoffrey E. Hinton Deterministic Boltzmann Learning Performs Steepest Descent in Weight-Space , 1989, Neural Computation.

[60]  K. Shimohara,et al.  A modified leaky integrator network for temporal pattern processing , 1989, International 1989 Joint Conference on Neural Networks.

[61]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1988, Neural Computation.

[62]  Terrence J. Sejnowski,et al.  Learning to Solve Random-Dot Stereograms of Dense and Transparent Surfaces with Recurrent Backpropagation , 1989 .

[63]  Geoffrey E. Hinton Learning distributed representations of concepts. , 1989 .

[64]  Yu He,et al.  Asymptotic Convergence of Backpropagation , 1989, Neural Computation.

[65]  M. Gori,et al.  BPS: a learning algorithm for capturing the dynamic nature of speech , 1989, International 1989 Joint Conference on Neural Networks.

[66]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[67]  Michael I. Jordan,et al.  Generic constraints on underspecified target trajectories , 1989, International 1989 Joint Conference on Neural Networks.

[68]  David S. Touretzky,et al.  Advances in neural information processing systems 2 , 1989 .

[69]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[70]  David Zipser,et al.  Subgrouping Reduces Complexity and Speeds Up Learning in Recurrent Networks , 1989, NIPS.

[71]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[72]  M. Tugay,et al.  Properties of the momentum LMS algorithm , 1989, Proceedings. Electrotechnical Conference Integrating Research, Industry and Education in Energy and Communication Engineering',.

[73]  Richard Rohwer,et al.  The "Moving Targets" Training Algorithm , 1989, NIPS.

[74]  Geoffrey E. Hinton,et al.  Dimensionality Reduction and Prior Knowledge in E-Set Recognition , 1989, NIPS.

[75]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[76]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[77]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[78]  Dean A. Pomerleau,et al.  What's hidden in the hidden layers? , 1989 .

[79]  Robert B. Allen,et al.  Learning of stable states in stochastic asymmetric networks , 1990, IEEE Trans. Neural Networks.

[80]  Yann LeCun,et al.  Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.

[81]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[82]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[83]  Frank H. Eeckman,et al.  CAM Storage of Analog Patterns and Continuous Sequences with 3N2 Weights , 1990, NIPS.

[84]  Terrence J. Sejnowski,et al.  Faster Learning for Dynamic Recurrent Backpropagation , 1990, Neural Computation.

[85]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[86]  L. B. Almeida A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[87]  Patrice Y. Simard,et al.  Shaping the State Space Landscape in Recurrent Networks , 1990, NIPS.

[88]  B. Baird A learning rule for CAM storage of continuous periodic sequences , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[89]  Barak A. Pearlmutter Dynamic recurrent neural networks , 1990 .

[90]  José Carlos Príncipe,et al.  A Theory for Neural Networks with Time Delays , 1990, NIPS.

[91]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[92]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[93]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[94]  Ulrich Bodenhausen Learning internal representations of pattern sequences in a neural network with adaptive time-delays , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[95]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[96]  Terrence J. Sejnowski,et al.  A Dynamic Neural Network Model of Sensorimotor Transformations in the Leech , 1990, Neural Computation.

[97]  Geoffrey E. Hinton,et al.  Adaptive Soft Weight Tying using Gaussian Mixtures , 1991, NIPS.

[98]  Raymond L. Watrous,et al.  Induction of Finite-State Automata Using Second-Order Recurrent Networks , 1991, NIPS.

[99]  Jürgen Schmidhuber,et al.  Learning Unambiguous Reduced Sequence Descriptions , 1991, NIPS.

[100]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[101]  K. P. Unnikrishnan,et al.  Nonlinear prediction of speech signals using memory neuron networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[102]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[103]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[104]  Pierre Baldi,et al.  Contrastive Learning and Neural Oscillations , 1991, Neural Computation.

[105]  Guo-Zheng Sun,et al.  Green's Function Method for Fast On-Line Learning Algorithm of Recurrent Neural Networks , 1991, NIPS.

[106]  Kumpati S. Narendra,et al.  Gradient methods for the optimization of dynamical systems containing neural networks , 1991, IEEE Trans. Neural Networks.

[107]  F. Fallside,et al.  Neural networks for signal processing : proceedings of the 1991 IEEE workshop , 1991 .

[108]  Ah Chung Tsoi,et al.  FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.

[109]  A. Selverston,et al.  Learning algorithms for oscillatory networks with gap junctions and membrane currents , 1991 .

[110]  Geoffrey E. Hinton,et al.  Deterministic Boltzmann Learning in Networks with Asymmetric Connectivity , 1991 .

[111]  Barak A. Pearlmutter Gradient Descent: Second Order Momentum and Saturating Error , 1991, NIPS.

[112]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[113]  C. Lee Giles,et al.  Using Prior Knowledge in a {NNPDA} to Learn Context-Free Languages , 1992, NIPS.

[114]  Roberto Gemello,et al.  Word recognition with recurrent network automata , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[115]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[116]  Ronald J. Williams,et al.  Training recurrent networks using the extended Kalman filter , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[117]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[118]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[119]  Risto Miikkulainen,et al.  Data rectification using recurrent (Elman) neural networks , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[120]  José Carlos Príncipe,et al.  The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[121]  Kenji Doya,et al.  Maaping Between Neural and Physical Activities of the Lobster Gastric Mill , 1992, NIPS.

[122]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[123]  Michael C. Mozer,et al.  A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[124]  Pierre Roussel-Ragot,et al.  Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms , 1993, Neural Computation.

[125]  Michael C. Mozer,et al.  A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction , 1993, NIPS.

[126]  Michael R. Davenport,et al.  Continuous-time temporal back-propagation with adaptable time delays , 1993, IEEE Trans. Neural Networks.

[127]  D.R. Hush,et al.  Progress in supervised neural networks , 1993, IEEE Signal Processing Magazine.

[128]  Kenji Doya,et al.  A Hodgkin-Huxley Type Neuron Model That Learns Slow Non-Spike Oscillations , 1993, NIPS.

[129]  John F. Kolen,et al.  Fool's Gold: Extracting Finite State Machines from Recurrent Network Dynamics , 1993, NIPS.

[130]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[131]  Ashok K. Agrawala,et al.  Study of Network Dynamics , 1993, Comput. Networks ISDN Syst..

[132]  小谷 学 1993 International Joint Conference on Neural Networksに出席して , 1994 .

[133]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[134]  Yves Chauvin,et al.  Back-Propagation: Theory, Architecture, and Applications , 1995 .

[135]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[136]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[137]  Hua Lee,et al.  Maximum Entropy and Bayesian Methods. , 1996 .

[138]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[139]  Krzysztof J. Cios,et al.  Advances in neural information processing systems 7 , 1997 .

[140]  Alexander J. Smola,et al.  Neural Information Processing Systems , 1997, NIPS 1997.