An Adaptive Utterance Verification Framework Using Minimum Verification Error Training

This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add-on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two-stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.

[1]  Timothy J. Hazen,et al.  A comparison and combination of methods for OOV word detection and word confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  W·M·贝尔特曼,et al.  Speech audio process , 2011 .

[3]  Eduardo Lleida,et al.  Utterance verification in continuous speech recognition: decoding and training procedures , 2000, IEEE Trans. Speech Audio Process..

[4]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[5]  Myoung-Wan Koo,et al.  Speech recognition and utterance verification based on a generalized confidence score , 2001, IEEE Trans. Speech Audio Process..

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Wu Chou Minimum Classification Error (MCE) Approach in Pattern Recognition , 2003 .

[9]  Biing-Hwang Juang,et al.  A study on rescoring using HMM-based detectors for continuous speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[10]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[11]  Man-Hung Siu,et al.  Minimization of Utterance Verification Error Rate as a Constrained Optimization Problem , 2006, IEEE Signal Processing Letters.

[12]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[13]  Biing-Hwang Juang,et al.  Discriminative linear-transform based adaptation using minimum verification error , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Qiang Huo,et al.  A Study of Minimum Classification Error (MCE) Linear Regression for Supervised Adaptation of MCE-Trained Continuous-Density Hidden Markov Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Chin-Hui Lee,et al.  String-based minimum verification error (SB-MVE) training for speech recognition , 1997, Comput. Speech Lang..

[17]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[18]  F. K. Soong Generalized word posterior probability (GWPP) for measuring reliability of recognized words , 2004 .

[19]  Biing-Hwang Juang,et al.  Segment-based phonetic class detection using minimum verification error (MVE) training , 2005, INTERSPEECH.

[20]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[21]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[22]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..