An RNN based speech recognition system with discriminative training

In our previous work [1], a novel method of utilizing a set of fully connected recurrent neural networks (RNNs) for speech modeling has been proposed. Despite the e ectiveness of the RNN model in characterizing individual speech units, the system performs less satisfactorily for speech recognition due to poor discrimination between models. In this paper, an e cient discriminative training procedure is developed for the RNN based recognition system. By using discriminative training, each RNN speech model is adjusted to reduce its distance from the designated speech unit while increase distances from the others. In addition, a duration-screening process is introduced to enhance the discriminating power of the recognition system. Speaker-dependent recognition experiments have been carried out for 1) 11 isolated Cantonese digits, 2) 58 very confusing Cantonese CV syllables, and 3) 20 English isolated words. The recognition rates attained are 90.9%, 86.7% and 93.5% respectively.

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  P. C. Ching,et al.  From phonology and acoustic properties to automatic recognition of Cantonese , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[3]  Lai-Wan Chan,et al.  Recurrent neural networks for speech modeling and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Douglas D. O'Shaughnessy,et al.  Use of minimum duration and energy contour for phonemes to improve large vocabulary isolated-word recognition☆ , 1992 .

[5]  Biing-Hwang Juang,et al.  Discriminative training of dynamic programming based speech recognizers , 1993, IEEE Trans. Speech Audio Process..