Learning Precise Timing with LSTM Recurrent Networks

The temporal distance between events conveys information essential for numerous sequential tasks such as motor control and rhythm detection. While Hidden Markov Models tend to ignore this information, recurrent neural networks (RNNs) can in principle learn to make use of it. We focus on Long Short-Term Memory (LSTM) because it has been shown to outperform other RNNs on tasks involving long time lags. We find that LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes spaced either 50 or 49 time steps apart without the help of any short training exemplars. Without external resets or teacher forcing, our LSTM variant also learns to generate stable streams of precisely timed spikes and other highly nonlinear periodic patterns. This makes LSTM a promising approach for tasks that require the accurate measurement or generation of time intervals.

[1]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[2]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[3]  Kenji Doya,et al.  Adaptive neural oscillator using continuous-time back-propagation learning , 1989, Neural Networks.

[4]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[5]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[6]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[7]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[8]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[9]  Garrison W. Cottrell,et al.  Phase-Space Learning , 1994, NIPS.

[10]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[11]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[12]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[13]  Abhay B. Bulsari,et al.  A recurrent network for modeling noisy temporal sequences , 1995, Neurocomputing.

[14]  Mikel L. Forcada,et al.  Learning the Initial State of a Second-Order Recurrent Neural Network during Regular-Language Inference , 1995, Neural Computation.

[15]  Martin Georg Weiß,et al.  Learning Oscillations Using Adaptive Control , 1997, ICANN.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Martin Weis Learning Oscillations Using Adaptive Control , 1997 .

[18]  Jürgen Schmidhuber,et al.  Language identification from prosody without explicit features , 1999, EUROSPEECH.

[19]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[20]  David H. Owens,et al.  Existence and learning of oscillations in recurrent neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[21]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[22]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[23]  Jürgen Schmidhuber,et al.  Learning the Long-Term Structure of the Blues , 2002, ICANN.

[24]  G. Miller Learning to Forget , 2004, Science.