Framewise phoneme classification with bidirectional LSTM networks

In this paper, we apply bidirectional training to a long short term memory (LSTM) network for the first time. We also present a modified, full gradient version of the LSTM learning algorithm. We discuss the significance of framewise phoneme classification to continuous speech recognition, and the validity of using bidirectional networks for online causal tasks. On the TIMIT speech database, we measure the framewise phoneme classification scores of bidirectional and unidirectional variants of both LSTM and conventional recurrent neural networks (RNNs). We find that bidirectional LSTM outperforms both RNNs and unidirectional LSTM.

[1]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[2]  Martin A. Riedmiller,et al.  RPROP - A Fast Adaptive Learning Algorithm , 1992 .

[3]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[4]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[5]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[6]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[7]  Ruxin Chen,et al.  Experiments on the implementation of recurrent neural networks for speech phone recognition , 1996, Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[10]  Yoshinori Sagisaka,et al.  Phoneme boundary estimation using bidirectional recurrent neural networks and its applications , 1999 .

[11]  Yoshinori Sagisaka,et al.  Phoneme boundary estimation using bidirectional recurrent neural networks and its applications , 1999, Systems and Computers in Japan.

[12]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[13]  Mike Schuster,et al.  On supervised learning from sequential data with applications for speech regognition , 1999 .

[14]  Giovanni Soda,et al.  Bidirectional Dynamics for Protein Secondary Structure Prediction , 2001, Sequence Learning.

[15]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[16]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[17]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[18]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.

[19]  Narendra S. Chaudhari,et al.  Capturing Long-Term Dependencies for Protein Secondary Structure Prediction , 2004, ISNN.

[20]  Jürgen Schmidhuber,et al.  Biologically Plausible Speech Recognition with LSTM Neural Nets , 2004, BioADIT.

[21]  Nicole Beringer,et al.  Human language acquisition methods in a machine learning task , 2004, INTERSPEECH.

[22]  Jürgen Schmidhuber,et al.  A comparison between spiking and differentiable recurrent neural networks on spoken digit recognition , 2004, Neural Networks and Computational Intelligence.