Investigation on rescoring using minimum verification error (MVE) detectors

Discriminative training, especially Minimum Verification Error (MVE) method plays an important role in the detectionbased ASR. Recently, discriminative training also has been shown to be effective in large vocabulary continuous speech recognition [1]. In this paper, we propose a rescoring framework to show the improvement by fusing MVE-trained detectors with a conventional recognizer. The recognizer performs regular Viterbi decoding, generating possible recognition candidates with corresponding likelihood in a fashion of either N-best lists or word graphs. Detectors trained under MVE criterion form and conduct hypothesis testing for all test tokens to accomplish additional scores. A number of linear or non-linear rescoring methods are then presented to combine these two groups of scores. The experiments were conducted on the TIMIT database, and the results indicates that combining based on word graphs outperforms the one on N-best lists in the final accuracy. This rescoring framework explores possible ways to combine other independent knowledge sources with a conventional recognizer. Further more, it can guide the future research of the pure detection-based ASR techniques.

[1]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[2]  Biing-Hwang Juang,et al.  Segment-based phonetic class detection using minimum verification error (MVE) training , 2005, INTERSPEECH.

[3]  Jinyu Li,et al.  A study on knowledge source integration for candidate rescoring in automatic speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Chin-Hui Lee,et al.  Towards knowledge-based features for HMM based large vocabulary automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hermann Ney,et al.  Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[6]  Biing-Hwang Juang,et al.  Flexible speech understanding based on combined key-phrase detection and verification , 1998, IEEE Trans. Speech Audio Process..