Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies

D3EGF(FIH)J KMLONPEGQSRPETN UCV.WYX(Z R.[ V R6\M[ X N@]_^O\`JaNcb V RcQ W d EGKeL(^(QgfhKeLOE?i)^(QSj ETNPfPQkRl[ V R)m"[ X ^(KeLOEG^ npo qarpo m"[ X ^(KeLOEG^tsAu EGNPb V ^ v wyx zlwO{(|(}<~OC}€(‚(xp{aƒy„.~A}†…ˆ‡_~ ‰CŠlƒ3‰#|<€Az†w#|l€6‡ ‹(| Œ JpfhL XVŽ EG^O QgJ ‘ ETFOR†’“] ^O\”J•NPb V RcQ—– X E)ETR ˜6EGKeLOETNcKMLOEš™ Fˆ› ETN V RcQgJp^(^OE ZgZ E i ^(Qkj EGNPfhQSRO› E œOE2m1Jp^ RcNY› E V•Z sOŸž! ¡ q.n sCD X KGKa’8¢EG^ RPNhE¤£ ¥¦Q ZgZ E•s m§J•^ RPNO› E V•Z s( ̈ X › EG©#EKas# V ^ V œ V s(H a «a•¬3­ ®#|.€Y ̄y} xa°OC}l{x“‡ ‰ ƒyxl€Y~3{| „ ±2‡Pz „ ž V J Z J U N V fhKTJp^(Q ‘ ETFOR†’ J•\ D vYf3RPEGb ́f V ^(œ§ˆJpbF X RPETN@D KTQ—EG^(KTE i ^(QSjpEGNPfhQSR4vμJ•\ U¶Z JaNPEG^(K·E jYQ V œ(Q ̧D V ^ R V m V N3R V aOs#1 o ¡Ga r U Q—NhE^OoTE1⁄4»,] R V•Z vC1⁄2 3⁄4 „ x ± x  ‹#¿ }À‡ ‰3€t}l‚C}2‡P}<~ ¬t[ X NP•E^§D KeL(b ́Qgœ(L X ©yETN ] ‘ DY]_Á ˆJ•NPfhJàZ j EToQ V a• rpopo2Ä X  V ^(J(sCD Å)QSRPoTEGN ZgV ^(œ Æ ‰#|•{3 ̄|.€(C}.‹C¿Y}p„ ‡Pz†w

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[3]  Fernando J. Pineda,et al.  Dynamics and architecture for neural computation , 1988, J. Complex..

[4]  José Carlos Príncipe,et al.  A Theory for Neural Networks with Time Delays , 1990, NIPS.

[5]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[6]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[7]  Pierre Baldi,et al.  Contrastive Learning and Neural Oscillations , 1991, Neural Computation.

[8]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[9]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[10]  Guo-Zheng Sun,et al.  Time Warping Invariant Neural Networks , 1992, NIPS.

[11]  Mark B. Ring Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[12]  K. Doya,et al.  Bifurcations in the learning of recurrent neural networks , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[13]  Yoshua Bengio,et al.  Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.

[14]  Jürgen Schmidhuber,et al.  Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[15]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[16]  Peter J. Angeline,et al.  An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[17]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[18]  Yoshua Bengio,et al.  Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[19]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[20]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[21]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  C. Lee Giles,et al.  How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies , 1998, Neural Networks.

[24]  Jürgen Schmidhuber,et al.  Language identification from prosody without explicit features , 1999, EUROSPEECH.

[25]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.