Statistical and Discriminative Methods for Speech Recognition

A critical component in the pattern matching approach to speech recognition is the training algorithm which aims at producing typical (reference) patterns or models for accurate pattern comparison. In this chapter, we discuss the issue of speech recognizer training from a broad perspective with root in the classical Bayes decision theory. We differentiate the method of classifier design by way of distribution estimation and the method of discriminative training based on the fact that in many realistic applications, such as speech recognition, the real signal distribution form is rarely known precisely. We argue that traditional methods relying on distribution estimation are suboptimal when the assumed distribution form is not the true one, and that “optimality” in distribution estimation does not automatically translate into “optimality” in classifier design. We compare the two different methods in the context of hidden Markov modeling for speech recognition. We show the superiority of the discriminative method over the distribution estimation method by providing the results of several key speech recognition experiments.

[1]  J. Doob Stochastic processes , 1953 .

[2]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[3]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  D. Pollard Convergence of stochastic processes , 1984 .

[6]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[7]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[10]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for speaker independent large vocabulary continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[12]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for large vocabulary continuous speech recognition , 1992 .

[14]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[15]  Biing-Hwang Juang,et al.  Minimum error rate training based on N-best string models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[17]  Biing-Hwang Juang,et al.  An algorithm of high resolution and efficient multiple string hypothesization for continuous speech recognition using inter-word models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Biing-Hwang Juang,et al.  Minimum error rate training of inter-word context dependent acoustic model units in speech recognition , 1994, ICSLP.

[19]  Biing-Hwang Juang,et al.  A Minimum Error Rate Pattern Recognition Approach to Speech Recognition , 1994, Int. J. Pattern Recognit. Artif. Intell..

[20]  B. Juang,et al.  A study on minimum error discriminative training for speaker recognition , 1995 .