论文信息 - Recurrent Neural Net Learning and Vanishing

Recurrent Neural Net Learning and Vanishing

Recurrent nets are in principle capable to store past inputs to produce the currently desired output. This recurrent net property is used in time series prediction and process control. Practical applications involve temporal dependencies spanning many time steps between relevant inputs and desired outputs. In this case, however, gradient descent learning methods take to much time. The learning time problem appears because the error vanishes as it gets propagated back. The decaying error ow is theoretically analyzed. Then methods trying to overcome vanishing gradient are mentioned. Finally, experiments comparing conventional algorithms and alternative methods are presented. Experiments using advanced methods show that learning long time lags problems can be done in reasonable time.

S. Hochreiter

[1] Barak A. Pearlmutter. Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[2] David Zipser,et al. Learning Sequential Structure with the Real-Time Recurrent Learning Algorithm , 1991, Int. J. Neural Syst..

[3] James L. McClelland,et al. Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[4] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[5] José Carlos Príncipe,et al. A Theory for Neural Networks with Time Delays , 1990, NIPS.

[6] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[7] Michael C. Mozer,et al. Induction of Multiscale Temporal Structure , 1991, NIPS.

[8] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[9] Jürgen Schmidhuber,et al. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[10] Raymond L. Watrous,et al. Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[11] Guo-Zheng Sun,et al. Time Warping Invariant Neural Networks , 1992, NIPS.