Recurrent Neural Net Learning and Vanishing

Recurrent nets are in principle capable to store past inputs to produce the currently desired output. This recurrent net property is used in time series prediction and process control. Practical applications involve temporal dependencies spanning many time steps between relevant inputs and desired outputs. In this case, however, gradient descent learning methods take to much time. The learning time problem appears because the error vanishes as it gets propagated back. The decaying error ow is theoretically analyzed. Then methods trying to overcome vanishing gradient are mentioned. Finally, experiments comparing conventional algorithms and alternative methods are presented. Experiments using advanced methods show that learning long time lags problems can be done in reasonable time.

[1]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[2]  David Zipser,et al.  Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm , 1991, Int. J. Neural Syst..

[3]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[4]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[5]  José Carlos Príncipe,et al.  A Theory for Neural Networks with Time Delays , 1990, NIPS.

[6]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[7]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[8]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[9]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[10]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[11]  Guo-Zheng Sun,et al.  Time Warping Invariant Neural Networks , 1992, NIPS.

[12]  Mark B. Ring Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[13]  Tony Plate,et al.  Holographic Recurrent Networks , 1992, NIPS.

[14]  Yoshua Bengio,et al.  Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.

[15]  Lee A. Feldkamp,et al.  Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks , 1994, IEEE Trans. Neural Networks.

[16]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[17]  Peter Tiňo,et al.  Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .

[18]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[19]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[20]  Corso Elvezia Bridging Long Time Lags by Weight Guessing and \long Short Term Memory" , 1996 .

[21]  Sepp Hochreiter,et al.  Guessing can Outperform Many Long Time Lag Algorithms , 1996 .

[22]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.