Discriminative Training for direct minimization of deletion, insertion and substitution errors

In this paper, we follow the minimum error principle for acoustic modeling and formulate error objectives in insertion, deletion, and substitution separately for minimization during training. This new training paradigm generalized from the MVE criterion can explain the direct relationship between recognition errors and detection errors by re-interpreting deletion, insertion, and substitution errors as miss, false alarm, and miss/false-alarm errors happening together. Under the MVE criterion, by applying two mis-verification measures for miss and false alarm errors selectively along with the types of recognition error definition, we developed three individual objective training criteria, minimum deletion error (MDE), minimum insertion error (MIE), and minimum substitution error (MSE), of which each objective function can directly minimize each of the three types of the recognition errors. In the TIMIT phone recognition task, the experimental results confirm that each objective criterion of MDE, MIE, and MSE results in primarily minimizing its target error type, respectively. Furthermore, a simple combination of the individual objective criteria outperforms the conventional string-based MCE in the overall recognition error rate.

[1]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Dwi Sianto Mansjur,et al.  Empirical System Learning for Statistical Pattern Recognition With Non-Uniform Error Criteria , 2010, IEEE Transactions on Signal Processing.

[3]  Chin-Hui Lee,et al.  An enhanced minimum classification error learning framework for balancing insertion, deletion and substitution errors , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[4]  Chin-Hui Lee,et al.  A Flexible Classifier Design Framework Based on Multiobjective Programming , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Qiang Fu,et al.  A generalization of the minimum classification error (MCE) training method for speech recognition and detection , 2008 .

[6]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Biing-Hwang Juang,et al.  Discriminative linear-transform based adaptation using minimum verification error , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Chin-Hui Lee,et al.  String-based minimum verification error (SB-MVE) training for speech recognition , 1997, Comput. Speech Lang..

[10]  Biing-Hwang Juang,et al.  Segment-based phonetic class detection using minimum verification error (MVE) training , 2005, INTERSPEECH.

[11]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[12]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[14]  Li Deng,et al.  Phone-discriminating minimum classification error (p-MCE) training for phonetic recognition , 2007, INTERSPEECH.

[15]  Biing-Hwang Juang,et al.  Context dependent anti subword modeling for utterance verification , 1998, ICSLP.

[16]  Biing-Hwang Juang,et al.  A study on rescoring using HMM-based detectors for continuous speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).