论文信息 - A comparison between spiking and differentiable recurrent neural networks on spoken digit recognition

A comparison between spiking and differentiable recurrent neural networks on spoken digit recognition

In this paper we demonstrate that Long Short-Term Memory (LSTM) is a differentiable recurrent neural net (RNN) capable of robustly categorizing timewarped speech data. We measure its performance on a spoken digit identification task, where the data was spike-encoded in such a way that classifying the utterances became a difficult challenge in non-linear timewarping. We find that LSTM gives greatly superior results to an SNN found in the literature, and conclude that the architecture has a place in domains that require the learning of large timewarped datasets, such as automatic speech recognition.

[1] David J. Burr. Speech Recognition Experiments with Perceptrons , 1987, NIPS.

[2] S. Gooch. ALL TOGETHER NOW , 1988 .

[3] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[5] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[6] Anthony J. Robinson,et al. An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[7] J. J. Hopfield,et al. Pattern recognition computation using action potential timing for stimulus representation , 1995, Nature.

[8] Steve Young,et al. The HTK book , 1995 .

[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[10] Christopher J. Bishop,et al. Pulsed Neural Networks , 1998 .

[11] Sander M. Bohte,et al. SpikeProp: backpropagation for networks of spiking neurons , 2000, ESANN.

[12] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[13] Jürgen Schmidhuber,et al. LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[14] Jürgen Schmidhuber,et al. Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[15] Douglas Eck. Finding downbeats with a relaxation oscillator , 2002, Psychological research.

[16] Jürgen Schmidhuber,et al. Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[17] Henry Markram,et al. A Model for Real-Time Computation in Generic Neural Microcircuits , 2002, NIPS.

[18] Jürgen Schmidhuber,et al. Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.