论文信息 - Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies - 字舞流文

Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies

D3EGF(FIH)J KMLONPEGQSRPETN UCV.WYX(Z R.[ V R6\M[ X N@]_^O\`JaNcb V RcQ W d EGKeL(^(QgfhKeLOE?i)^(QSj ETNPfPQkRl[ V R)m"[ X ^(KeLOEG^ npo qarpo m"[ X ^(KeLOEG^tsAu EGNPb V ^ v wyx zlwO{(|(}<~OC}((xp{ay.~A}_~ Cl3#|<Azw#|l6 (| JpfhL XV EG^O QgJ ETFOR] ^O\JNPb V RcQ X E)ETR 6EGKeLOETNcKMLOE F ETN V RcQgJp^(^OE ZgZ E i ^(Qkj EGNPfhQSRO E OE2m1Jp^ RcNY E VZ sO! ¡ q.n sCD X KGKa8¢EG^ RPNhE¤£ ¥¦Q ZgZ Es m§J^ RPNO E VZ s( ̈ X EG©#EKas# V ^ V V s(H a «a¬3 ®#|.Y ̄y} xa°OC}l{x yxlY~3{| ±2Pz V J Z J U N V fhKTJp^(Q ETFOR J\ D vYf3RPEGb ́f V ^(§JpbF X RPETN@D KTQEG^(KTE i ^(QSjpEGNPfhQSR4vμJ\ U¶Z JaNPEG^(K·E jYQ V (Q ̧D V ^ R V m V N3R V aOs#1 o ¡Ga r U QNhE^OoTE1⁄4»,] R VZ vC1⁄2 3⁄4 x ± x #¿ }À 3t}lC}2P}<~ ¬t[ X NPE^§D KeL(b ́Qg(L X ©yETN ] DY]_Á JNPfhJÃÂ Z j EToQ V a rpopo2Ä X V ^(J(sCD Å)QSRPoTEGN ZgV ^( Æ #|{3 ̄|.(C}.C¿Y}p Pzw

Yoshua Bengio | Sepp Hochreiter | S. Hochreiter | Yoshua Bengio

[1] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[2] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[3] Fernando J. Pineda,et al. Dynamics and architecture for neural computation , 1988, J. Complex..

[4] José Carlos Príncipe,et al. A Theory for Neural Networks with Time Delays , 1990, NIPS.

[5] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[6] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[7] Pierre Baldi,et al. Contrastive Learning and Neural Oscillations , 1991, Neural Computation.

[8] Michael C. Mozer,et al. Induction of Multiscale Temporal Structure , 1991, NIPS.

[9] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[10] Guo-Zheng Sun,et al. Time Warping Invariant Neural Networks , 1992, NIPS.

[11] Mark B. Ring. Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[12] K. Doya,et al. Bifurcations in the learning of recurrent neural networks , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[13] Yoshua Bengio,et al. Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.

[14] Jürgen Schmidhuber,et al. Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[15] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[16] Peter J. Angeline,et al. An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[17] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[18] Yoshua Bengio,et al. Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[19] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[20] Jürgen Schmidhuber,et al. LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[21] Peter Tiño,et al. Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[23] C. Lee Giles,et al. How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies , 1998, Neural Networks.

[24] Jürgen Schmidhuber,et al. Language identification from prosody without explicit features , 1999, EUROSPEECH.

[25] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.