Comparing LSTM Recurrent Networks with Spiking Recurrent Networks on the Recognition of Spoken Digits.

One advantage of spiking recurrent neural networks (SNNs) is an ability to categorise data using a synchrony-based latching mechnanism. This is particularly useful in problems where timewarping is encountered, such as speech recognition. Differentiable recurrent neural networks (RNNs) by contrast fail at tasks involving difficult timewarping, despite having sequence learning capabilities superior to SNNs. In this paper we demonstrate that Long Short-Term Memory (LSTM) is an RNN capable of robustly categorizing timewarped speech data, thus combining the most useful features of both paradigms. We compare its performance to SNNs on two variants of a spoken digit identification task, using data from an international competition. The first task (described in Nature [15]) required the categorisation of spoken digits with only a single training exemplar, and was specifically designed to test robustness to timewarping. Here LSTM performed better than all the SNNs in the competition. The second task was to predict spoken digits using a larger training set. Here LSTM greatly outperformed an SNN-like model found in the literature. These results suggest that LSTM has a place in domains that require the learning of large timewarped datasets, such as automatic speech recognition.

[1]  David J. Burr Speech Recognition Experiments with Perceptrons , 1987, NIPS.

[2]  Lawrence R. Rabiner,et al.  A Tutorial on Hidden Markov Models and Selected Applications , 1989 .

[3]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[4]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[5]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[6]  J. J. Hopfield Pattern recognition computation using action potential timing for stimulus representation , 1995, Nature.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Christopher J. Bishop Pulsed Neural Networks , 1998 .

[9]  J J Hopfield,et al.  What is a moment? "Cortical" sensory integration over a brief interval. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Sander M. Bohte,et al.  SpikeProp: backpropagation for networks of spiking neurons , 2000, ESANN.

[11]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[12]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[13]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[14]  Douglas Eck Finding downbeats with a relaxation oscillator , 2002, Psychological research.

[15]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[16]  Henry Markram,et al.  A Model for Real-Time Computation in Generic Neural Microcircuits , 2002, NIPS.

[17]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.

[18]  J. Glasby All together now... , 2004, Nature Reviews Microbiology.