A study on rescoring using HMM-based detectors for continuous speech recognition

This paper presents an investigation of the rescoring performance using hidden Markov model (HMM) based attribute detectors. The minimum verification error (MVE) criterion is employed to enhance the reliability of the detectors in continuous speech recognition. The HMM-based detectors are applied on the possible recognition candidates, which are generated from the conventional decoder and organized in phone/word graphs. We focus on the study of rescoring performance with the detectors trained on the tokens produced by the decoder but labeled in broad phonetic categories rather than the phonetic identities. Various training criteria and knowledge fusion methods are investigated under various semantic level rescoring scenarios. This research demonstrates various possibilities of embedding auxiliary information into the current automatic speech recognition (ASR) framework for improved results. It also represents an intermediate step towards the construction of a true detection-based ASR paradigm.

[1]  Jinyu Li,et al.  A study on lattice rescoring with knowledge scores for automatic speech recognition , 2006, INTERSPEECH.

[2]  Biing-Hwang Juang,et al.  Segment-based phonetic class detection using minimum verification error (MVE) training , 2005, INTERSPEECH.

[3]  Lawrence R. Rabiner,et al.  Automatic Speech Attribute Transcription (ASAT) - The Front End Processor , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Yu Tsao,et al.  A study on detection based automatic speech recognition , 2006, INTERSPEECH.

[5]  Jinyu Li,et al.  A study on knowledge source integration for candidate rescoring in automatic speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Jinyu Li,et al.  On designing and evaluating speech event detectors , 2005, INTERSPEECH.

[7]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[8]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Biing-Hwang Juang,et al.  Investigation on rescoring using minimum verification error (MVE) detectors , 2006, INTERSPEECH.

[11]  Biing-Hwang Juang,et al.  Flexible speech understanding based on combined key-phrase detection and verification , 1998, IEEE Trans. Speech Audio Process..

[12]  Jinyu Li,et al.  Detection-based ASR in the automatic speech attribute transcription project , 2007, INTERSPEECH.

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .