Adaptive boosted non-uniform mce for keyword spotting on spontaneous speech

In this work, we present a complete framework of discriminative training using non-uniform criteria for keyword spotting, adaptive boosted non-uniform minimum classification error (MCE) for keyword spotting on spontaneous speech. To further boost the spotting performance and tackle the potential issue of over-training in the non-uniform MCE proposed in our prior work, we make two improvements to the fundamental MCE optimization procedure. Furthermore, motivated by AdaBoost, we introduce an adaptive scheme to embed error cost functions together with model combinations during the decoding stage. The proposed framework is comprehensively validated on two challenging large-scale spontaneous conversational telephone speech (CTS) tasks in different languages (English and Mandarin) and the experimental results show it can achieve significant and consistent figure of merit (FOM) gains over both ML and discriminatively trained systems.

[1]  George Saon,et al.  Boosting systems for large vocabulary continuous speech recognition , 2012, Speech Commun..

[2]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[3]  Samy Bengio,et al.  Boosting HMMs with an application to speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  Rong Zhang,et al.  Comparative study of boosting and non-boosting training for constructing ensembles of acoustic models , 2003, INTERSPEECH.

[8]  Chao Weng,et al.  Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[12]  Geoffrey Zweig,et al.  Boosting Gaussian mixtures in an LVCSR system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Ngoc Thang Vu,et al.  Generating exact lattices in the WFST framework , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Rong Zhang,et al.  A frame level boosting training scheme for acoustic modeling , 2004, INTERSPEECH.

[16]  Andreas Stolcke,et al.  Progress on Mandarin conversational telephone speech recognition , 2004, 2004 International Symposium on Chinese Spoken Language Processing.