Robust utterance verification for connected digits recognition

Utterance verification represents an important technology in the design of user-friendly speech recognition systems. This paper addresses the issue of robustness in utterance verification. Four different approaches to robustness have been investigated: a string based likelihood measure for the detection of non-vocabulary words and "putative" errors, a signal bias removal method for channel normalization, on-line adaptation technique for achieving desirable trade-off between false rejection and false alarms, and a discriminative training method for the minimization of the expected string error rate. When these techniques were all integrated into a state-of-the-art connected digit recognition system, the string error rate was found to decrease by up to 57% at a rejection rate of 5%. For non-vocabulary word strings, the proposed utterance verification system rejected over 99.9% of extraneous speech.

[1]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[3]  B. Chigier,et al.  Rejection and keyword spotting algorithms for a directory assistance city name recognition application , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Rafid A. Sukkar,et al.  Rejection for connected digit recognition based on GPD segmental discrimination , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[6]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[7]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.