A Maximum Likelihood Approach to Unsupervised Online Adaptation of Stochastic Vector Mapping Function for Robust Speech Recognition

In the past several years, we've been studying feature transformation approaches for robust automatic speech recognition (ASR) based on the concept of stochastic vector mapping (SVM) to compensating for possible "distortions" caused by factors irrelevant to phonetic classification in both training and recognition stages. Although we have demonstrated the usefulness of the SVM-based approaches for several robust ASR applications where diversified yet representative training data are available, the performance improvement of SVM-based approaches is less significant when there is a severe mismatch between training and testing conditions. In this paper, we present a maximum likelihood approach to unsupervised online adaptation (OLA) of SVM function parameters on an utterance-by-utterance basis for achieving further performance improvement. Its effectiveness is confirmed by evaluation experiments on Finnish AuroraS database.

[1]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[2]  Qiang Huo,et al.  A study of minimum classification error training for segmental switching linear Gaussian hidden Markov models , 2004, INTERSPEECH.

[3]  Alex Acero,et al.  Joint Discriminative Front End and Back End Training for Improved Speech Recognition Accuracy , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Richard M. Schwartz,et al.  Discriminatively Trained Region Dependent Feature Transforms for Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Qiang Huo,et al.  A maximum likelihood training approach to irrelevant variability compensation based on piecewise linear transformations , 2006, Interspeech.

[6]  Ramesh A. Gopinath,et al.  Feature Adaptation Based on Gaussian Posteriors , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[8]  Steve Young,et al.  The HTK book , 1995 .

[9]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[10]  Qiang Huo,et al.  Several HKU approaches for robust speech recognition and their evaluation on Aurora connected digit recognition tasks , 2003, INTERSPEECH.

[11]  Qiang Huo,et al.  An environment compensated maximum likelihood training approach based on stochastic vector mapping [speech recognition applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Qiang Huo,et al.  An environment compensated minimum classification error training approach and its evaluation on Aurora2 database , 2002, INTERSPEECH.

[13]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Li Deng,et al.  Analysis and comparison of two speech feature extraction/compensation algorithms , 2005, IEEE Signal Processing Letters.

[15]  Li Deng,et al.  High-performance robust speech recognition using stereo training data , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[17]  Patrick Kenny,et al.  Feature normalization using smoothed mixture transformations , 2006, INTERSPEECH.