Automatic Speech Recognition Based on Non-Uniform Error Criteria

The Bayes decision theory is the foundation of the classical statistical pattern recognition approach, with the expected error as the performance objective. For most pattern recognition problems, the “error” is conventionally assumed to be binary, i.e., 0 or 1, equivalent to error counting, independent of the specifics of the error made by the system. The term “error rate” is thus long considered the prevalent system performance measure. This performance measure, nonetheless, may not be satisfactory in many practical applications. In automatic speech recognition, for example, it is well known that some errors are more detrimental (e.g., more likely to lead to misunderstanding of the spoken sentence) than others. In this paper, we propose an extended framework for the speech recognition problem with non-uniform classification/recognition error cost which can be controlled by the system designer. In particular, we address the issue of system model optimization when the cost of a recognition error is class dependent. We formulate the problem in the framework of the minimum classification error (MCE) method, after appropriate generalization to integrate the class-dependent error cost into one consistent objective function for optimization. We present a variety of training scenarios for automatic speech recognition under this extended framework. Experimental results for continuous speech recognition are provided to demonstrate the effectiveness of the new approach.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[3]  F. K. Soong Generalized word posterior probability (GWPP) for measuring reliability of recognized words , 2004 .

[4]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[5]  Steve J. Young,et al.  MMI training for continuous phoneme recognition on the TIMIT database , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[7]  Zdravko Kacic,et al.  A novel loss function for the overall risk criterion based discriminative training of HMM models , 2000, INTERSPEECH.

[8]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[9]  David G. Stork,et al.  Pattern Classification , 1973 .

[10]  Shiri Gordon,et al.  An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  William J. Byrne,et al.  Task-specific minimum Bayes-risk decoding using learned edit distance , 2004, INTERSPEECH.

[12]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[13]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[14]  Jonathan G. Fiscus,et al.  Benchmark Tests for the DARPA Spoken Language Program , 1993, HLT.

[15]  Hermann Ney,et al.  Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[16]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[17]  Thomas Hain,et al.  Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition , 2006, INTERSPEECH.

[18]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[19]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[20]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[21]  William J. Byrne,et al.  Pinched lattice minimum Bayes risk discriminative training for large vocabulary continuous speech recognition , 2004, INTERSPEECH.

[22]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[24]  William J. Byrne,et al.  Lattice segmentation and minimum Bayes risk discriminative training for large vocabulary continuous speech recognition , 2006, Speech Commun..

[25]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[26]  Dwi Sianto Mansjur,et al.  Empirical System Learning for Statistical Pattern Recognition With Non-Uniform Error Criteria , 2010, IEEE Transactions on Signal Processing.

[27]  Shigeru Katagiri,et al.  Minimum classification error for large scale speech recognition tasks using weighted finite state transducers , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28]  Jun Du,et al.  Minimum divergence based discriminative training , 2006, INTERSPEECH.

[29]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[31]  Hui Jiang,et al.  Discriminative training of HMMs for automatic speech recognition: A survey , 2010, Comput. Speech Lang..