A comparison between spiking and differentiable recurrent neural networks on spoken digit recognition

In this paper we demonstrate that Long Short-Term Memory (LSTM) is a differentiable recurrent neural net (RNN) capable of robustly categorizing timewarped speech data. We measure its performance on a spoken digit identification task, where the data was spike-encoded in such a way that classifying the utterances became a difficult challenge in non-linear timewarping. We find that LSTM gives greatly superior results to an SNN found in the literature, and conclude that the architecture has a place in domains that require the learning of large timewarped datasets, such as automatic speech recognition.

[1]  David J. Burr Speech Recognition Experiments with Perceptrons , 1987, NIPS.

[2]  S. Gooch ALL TOGETHER NOW , 1988 .

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[5]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[6]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[7]  J. J. Hopfield,et al.  Pattern recognition computation using action potential timing for stimulus representation , 1995, Nature.

[8]  Steve Young,et al.  The HTK book , 1995 .

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Christopher J. Bishop,et al.  Pulsed Neural Networks , 1998 .

[11]  Sander M. Bohte,et al.  SpikeProp: backpropagation for networks of spiking neurons , 2000, ESANN.

[12]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[13]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[14]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[15]  Douglas Eck Finding downbeats with a relaxation oscillator , 2002, Psychological research.

[16]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[17]  Henry Markram,et al.  A Model for Real-Time Computation in Generic Neural Microcircuits , 2002, NIPS.

[18]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.