LSTM recurrent networks learn simple context-free and context-sensitive languages

Previous work on learning regular languages from exemplary training sequences showed that long short-term memory (LSTM) outperforms traditional recurrent neural networks (RNNs). We demonstrate LSTMs superior performance on context-free language benchmarks for RNNs, and show that it works even better than previous hardwired or highly specialized architectures. To the best of our knowledge, LSTM variants are also the first RNNs to learn a simple context-sensitive language, namely a(n)b(n)c(n).

[1]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[2]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[3]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[4]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[5]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[6]  K. Vijay-Shanker,et al.  Using Descriptions of Trees in a Tree Adjoining Grammar , 1992, Comput. Linguistics.

[7]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[8]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[9]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[10]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[11]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[12]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Yasubumi Sakakibara,et al.  Recent Advances of Grammatical Inference , 1997, Theor. Comput. Sci..

[15]  Ted Briscoe,et al.  Learning Stochastic Categorial Grammars , 1997, CoNLL.

[16]  Janet Wiles,et al.  Recurrent Neural Networks Can Learn to Implement Symbol-Sensitive Counting , 1997, NIPS.

[17]  Jordan B. Pollack,et al.  Analysis of Dynamical Recognizers , 1997, Neural Computation.

[18]  Pekka Orponen,et al.  On the Effect of Analog Noise in Discrete-Time Analog Computations , 1996, Neural Computation.

[19]  Helko Lehmann,et al.  Computation in Recurrent Neural Networks: From Counters to Iterated Function Systems , 1998, Australian Joint Conference on Artificial Intelligence.

[20]  Paul Rodríguez,et al.  A Recurrent Neural Network that Learns to Count , 1999, Connect. Sci..

[21]  Eduardo D. Sontag,et al.  Analog Neural Nets with Gaussian or Other Common Noise Distributions Cannot Recognize Arbitrary Regular Languages , 1999, Neural Computation.

[22]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[23]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.