Individual Error Minimization Learning Framework and its Applications to Speech Recognition and Utterance Verification

In this paper, we extend the individual recognition error minimization criteria, MDE/MIE/MSE [1] in word-level and apply them to word recognition and verification tasks, respectively. In order to effectively reduce potential errors in word-level, we expand the training token selection scheme to be more appropriate for word-level learning framework, by taking into account neighboring words and by covering internal phonemes in each training word. Then, we examine the proposed word-level learning criteria on the TIMIT word recognition task and further investigate individual rejection performance of the recognition errors in utterance verification (UV). Experimental results confirm that each of the word-level objective criteria results in primarily reducing the corresponding target error type, respectively. The rejection rates of insertion and substitution errors are also improved within MIE and MSE criteria, which lead to additional word error rate reduction after the rejection.

[1]  Eduardo Lleida,et al.  Utterance verification in continuous speech recognition: decoding and training procedures , 2000, IEEE Trans. Speech Audio Process..

[2]  Joost van Doremalen,et al.  Utterance verification in language learning applications , 2009, SLaTE.

[3]  Biing-Hwang Juang,et al.  Discriminative linear-transform based adaptation using minimum verification error , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[5]  Biing-Hwang Juang,et al.  Discriminative Training for direct minimization of deletion, insertion and substitution errors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jonathan G. Fiscus,et al.  REDUCED WORD ERROR RATES , 1997 .

[7]  Qiang Fu,et al.  A generalization of the minimum classification error (MCE) training method for speech recognition and detection , 2008 .

[8]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Chin-Hui Lee,et al.  String-based minimum verification error (SB-MVE) training for speech recognition , 1997, Comput. Speech Lang..

[10]  Biing-Hwang Juang,et al.  An overview on automatic speech attribute transcription (ASAT) , 2007, INTERSPEECH.

[11]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[12]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Biing-Hwang Juang,et al.  A study on rescoring using HMM-based detectors for continuous speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[14]  Biing-Hwang Juang,et al.  An Adaptive Utterance Verification Framework Using Minimum Verification Error Training , 2011 .