Classifier design for verification of multi-class recognition decision

This paper investigates a 2-class classifier approach with the aim of improving the word verification performance. The classifier operates on a discriminant function which is a linear combination of the smoothed likelihood ratios for the N-best candidates and the background (BG) and out-of-vocabulary (OOV) filler models, and is optimized using discriminative training to minimize the classification error. This paper discusses several strategies involving the likelihood ratio based formulation and the use of N-best candidates and the BG and OOV models in the classifier. In word verification experiments using a connected-digit database containing utterances recorded in a moving car with a hands-free microphone, the likelihood ratio based formulation achieved a relative error reduction of 35% in comparison with a likelihood based formulation. In addition, we observed that the use of N-best candidates and the BG and OOV models improved the performance with a relative error reduction of roughly 10%.

[1]  Chin-Hui Lee,et al.  Verifying and correcting recognition string hypotheses using discriminative utterance verification , 1997, Speech Commun..

[2]  Biing-Hwang Juang,et al.  Flexible speech understanding based on combined key-phrase detection and verification , 1998, IEEE Trans. Speech Audio Process..

[3]  Mark A. Randolph,et al.  A support vector machines-based rejection technique for speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[5]  Biing-Hwang Juang,et al.  Context dependent anti subword modeling for utterance verification , 1998, ICSLP.

[6]  Eduardo Lleida,et al.  Utterance verification in continuous speech recognition: decoding and training procedures , 2000, IEEE Trans. Speech Audio Process..

[7]  Chalapathy Neti,et al.  Word-based confidence measures as a guide for stack search in speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Timothy J. Hazen,et al.  Word and phone level acoustic confidence scoring , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[10]  Biing-Hwang Juang,et al.  Verification of multi-class recognition decision using classification approach , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[11]  Myoung-Wan Koo,et al.  A new decoder based on a generalized confidence score , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).