论文信息 - Learning long-term dependencies in segmented-memory recurrent neural networks with backpropagation of error

Learning long-term dependencies in segmented-memory recurrent neural networks with backpropagation of error

Abstract In general, recurrent neural networks have difficulties in learning long-term dependencies. The segmented-memory recurrent neural network (SMRNN) architecture together with the extended real-time recurrent learning (eRTRL) algorithm was proposed to circumvent this problem. Due to its computational complexity eRTRL becomes impractical with increasing network size. Therefore, we introduce the less complex extended backpropagation through time (eBPTT) for SMRNN together with a layer-local unsupervised pre-training procedure. A comparison on the information latching problem showed that eRTRL is better able to handle the latching of information over longer periods of time, even though eBPTT guaranteed a better generalisation when training was successful. Further, pre-training significantly improved the ability to learn long-term dependencies with eBPTT. Therefore, the proposed eBPTT algorithm is suited for tasks that require big networks where eRTRL is impractical. The pre-training procedure itself is independent of the supervised learning algorithm and can improve learning in SMRNN in general.

[1] Pascal Vincent,et al. The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[2] Andreas Wendemuth,et al. Implicit Sequence Learning - A Case Study with a 4-2-4 Encoder Simple Recurrent Network , 2010, IJCCI.

[3] Paul F. M. J. Verschure,et al. A real-world rational agent: unifying old and new AI , 2003, Cogn. Sci..

[4] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[6] Albert Croker,et al. On Temporal Grouping , 1995, Temporal Databases.

[7] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[8] Marilyn K. Rigby,et al. Influence of digit grouping on memory for telephone numbers. , 1963 .

[9] DeLiang Wang,et al. Anticipation Model for Sequential Learning of Complex Sequences , 2001, Sequence Learning.

[10] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[12] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[13] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[14] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[15] S. C. Kremer,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[16] Wan-Chi Siu,et al. Adding learning to cellular genetic algorithms for training recurrent neural networks , 1999, IEEE Trans. Neural Networks.

[17] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[18] Francesco Piazza,et al. Attempting to reduce the vanishing gradient effect through a novel recurrent multiscale architecture , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[19] C. L. Giles,et al. Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[20] Herbert Jaeger,et al. The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[21] Andreas Wendemuth,et al. Segmented-Memory Recurrent Neural Networks versus Hidden Markov Models in Emotion Recognition from Speech , 2011, IJCCI.

[22] Chuanyi Ji,et al. Fast training of recurrent networks based on the EM algorithm , 1998, IEEE Trans. Neural Networks.

[23] Yves Chauvin. Dynamic Behavior of Constained Back-Propagation Networks , 1989, NIPS.

[24] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[25] Narendra S. Chaudhari,et al. Segmented-Memory Recurrent Neural Networks , 2009, IEEE Transactions on Neural Networks.

[26] R. Frick. Explanations of grouping in immediate ordered recall , 1989, Memory & cognition.

[27] Narendra S. Chaudhari,et al. Capturing Long-Term Dependencies for Protein Secondary Structure Prediction , 2004, ISNN.

[28] C. Lee Giles,et al. How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies , 1998, Neural Networks.

[29] Jürgen Schmidhuber,et al. LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[30] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[31] W. A. Wickelgren,et al. Rehearsal Grouping and Hierarchical Organization of Serial Position Cues in Short-Term Memory , 1967, The Quarterly journal of experimental psychology.

[32] Ferdinand Hergert,et al. Improving model selection by nonconvergent methods , 1993, Neural Networks.

[33] Daniel Povey,et al. Revisiting Recurrent Neural Networks for robust ASR , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[35] Joanna Ryan. Temporal Grouping, Rehearsal and Short-Term Memory , 1969, The Quarterly journal of experimental psychology.

[36] Peter Tiño,et al. Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[37] N. Burgess,et al. Temporal Grouping Effects in Immediate Recall: A Working Memory Analysis , 1996 .

[38] P. J. Werbos,et al. Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[39] Francesco Piazza,et al. A recurrent multiscale architecture for long-term memory prediction task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..