Discriminative linear-transform based adaptation using minimum verification error

This paper presents an investigation of the minimum verification error linear regression (MVELR) method for discriminative linear-transform based adaptation. The MVE criterion is employed to estimate a set of discriminative linear transformations which achieve the smallest empirical average loss with the given adaptation data. The MVELR directly minimizes the total detection errors, some of which are results of characteristic mismatch in the given adaptation data. In this study, segment-based phonetic detectors reflecting an important processing layer in speech event detection are initially trained via the conventional maximum likelihood (ML) method and then refined via the general MVE method using the original training data. Then, the initial MVE-trained detectors are adapted by two kinds of adaption techniques, MLLR and MVELR, respectively, with the given adaptation data for comparison. The experiments are performed on a supervised adaptation scenario and the effectiveness of the adapted detectors is evaluated based on the total detection error. Experimental results confirm the proposed MVELR method considerably reduces the total error rate over all categories of the detectors compared to the MLLR.

[1]  Qiang Huo,et al.  A Study of Minimum Classification Error (MCE) Linear Regression for Supervised Adaptation of MCE-Trained Continuous-Density Hidden Markov Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[5]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[6]  Biing-Hwang Juang,et al.  A study on recognizing distorted speech over local distributed transducer networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Biing-Hwang Juang,et al.  Segment-based phonetic class detection using minimum verification error (MVE) training , 2005, INTERSPEECH.

[10]  Wu Chou,et al.  Minimum classification error linear regression for acoustic model adaptation of continuous density HMMs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..