Rapid Retraining on Speech Data with LSTM Recurrent Networks.

A system that could be quickly retrained on different corpora would be of great benefit to speech recognition. Recurrent Neural Networks (RNNs) are able to transfer knowledge by simply storing and then retraining their weights. In this report, we partition the TIDIGITS database into utterances spoken by men, women, boys and girls, and successively retrain a Long Short Term Memory (LSTM) RNN on them. We find that the network rapidly adapts to new subsets of the data, and achieves greater accuracy than when trained on them from scratch. This would be useful for applications requiring either cross corpus adaptation or continually expanding datasets.

[1]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[4]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent connectionist networks , 1990 .

[5]  F. Gers,et al.  Long short-term memory in recurrent neural networks , 2001 .

[6]  Ciro Martins,et al.  An incremental speaker-adaptation technique for hybrid HMM-MLP recognizer , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Nicole Beringer,et al.  The quality of multilingual automatic segmentation using German MAUS , 2000, INTERSPEECH.

[8]  Hagen Soltau,et al.  Confidence measure based language identification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[10]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..