Segment-based phonetic class detection using minimum verification error (MVE) training

In this paper, we investigate the performance of segment-based detectors for three taxonomic sets of acoustic-phonetic classes. Acoustic-phonetic detectors form an important processing layer for speech event decoding in the new detection-based automatic speech recognition. In this study, detectors are trained within a minimum verification error (MVE) framework which is markedly different from the conventional maximum likelihood (ML) method. Performance evaluations are conducted upon the TIMIT database by comparing detectors trained via MVE and detectors trained via maximum likelihood. Remarkable improvement in terms of detection error reduction is observed and reported. The result is a solid manifestation of the effectiveness of the discriminative training method, particularly MVE, in the detection-based speech recognition approach. These detectors, aside from being an important processing stage in an overall speech recognition system, can also be extended for applications in diagnostic information retrieval or recognition rescoring for utterance verification.

[1]  Chin-Hui Lee,et al.  String-based minimum verification error (SB-MVE) training for speech recognition , 1997, Comput. Speech Lang..

[2]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[3]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).