The problem of learning long-term dependencies in recurrent networks

The authors seek to train recurrent neural networks in order to map input sequences to output sequences, for applications in sequence recognition or production. Results are presented showing that learning long-term dependencies in such recurrent networks using gradient descent is a very difficult task. It is shown how this difficulty arises when robustly latching bits of information with certain attractors. The derivatives of the output at time t with respect to the unit activations at time zero tend rapidly to zero as t increases for most input values. In such a situation, simple gradient descent techniques appear inappropriate. The consideration of alternative optimization methods and architectures is suggested.<<ETX>>

[1]  Yoshua Bengio,et al.  Artificial neural networks and their application to sequence recognition , 1991 .

[2]  Charles M. Marcus,et al.  Nonlinear dynamics and stability of analog neural networks , 1991 .

[3]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[4]  C. L. Giles,et al.  Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[5]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.