论文信息 - Automatic speech recognition based on weighted minimum classification error (W-MCE) training method

Automatic speech recognition based on weighted minimum classification error (W-MCE) training method

The Bayes decision theory is the foundation of the classical statistical pattern recognition approach. For most of pattern recognition problems, the Bayes decision theory is employed assuming that the system performance metric is defined as the simple error counting, which assigns identical cost to each recognition error. However, this prevalent performance metric is not desirable in many practical applications. For example, the cost of "recognition" error is required to be differentiated in keyword spotting systems. In this paper, we propose an extended framework for the speech recognition problem with non-uniform classification/recognition error cost. As the system performance metric, the recognition error is weighted based on the task objective. The Bayes decision theory is employed according to this performance metric and the decision rule with a non-uniform error cost function is derived. We argue that the minimum classification error (MCE) method, after appropriate generalization, is the most suitable training algorithm for the "optimal" classifier design to minimize the weighted error rate. We formulate the weighted MCE (W-MCE) algorithm based on the conventional MCE infrastructure by integrating the error cost and the recognition error count into one objective function. In the context of automatic speech recognition (ASR), we present a variety of training scenarios and weighting strategies under this extended framework. The experimental demonstration for large vocabulary continuous speech recognition is provided to support the effectiveness of our approach.

Biing-Hwang Juang | Qiang Fu

[1] Steve J. Young,et al. Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Biing-Hwang Juang,et al. Generalization of the minimum classification error (MCE) training based on maximizing generalized posterior probability (GPP) , 2006, INTERSPEECH.

[3] Biing-Hwang Juang,et al. Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[4] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5] Hermann Ney,et al. Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[6] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Jonathan G. Fiscus,et al. 1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[8] Hermann Ney,et al. Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..

[9] Yves Normandin,et al. Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .