Hidden Markov Models for Speech Recognition

The use of hidden Markov models for speech recognition has become predominant in the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons this method has become so popular are the inherent statistical (mathematically precise) framework; the ease and availability of training algorithms for cstimating the parameters of the models from finite training sets of speech data; the flexibility of the resulting recognition system in which one can easily change the size, type, or architecture of the models to suit particular words, sounds, and so forth; and the ease of implementation of the overall recognition system. In this expository article, we address the role of statistical methods in this powerful technology as applied to speech recognition and discuss a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations.

[1]  I. Good Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables , 1963 .

[2]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[3]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[4]  Chung-Ying Cheng,et al.  Language and symbolic systems , 1968 .

[5]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[6]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[7]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[8]  Ronald W. Schafer,et al.  Design of digital filter banks for speech analysis , 1971 .

[9]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[10]  E. A. Flinn Comments on “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave” [B. S. Atal and S. L. Hanauer, J. Acoust. Soc. Amer. 50, 637–655 (1971)] , 1972 .

[11]  Wayne A. Lea,et al.  Prosodic Aids to Speech Recognition , 1972 .

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[14]  A. Hobson,et al.  A comparison of the Shannon and Kullback information measures , 1973 .

[15]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[16]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[17]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[18]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[19]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[20]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .

[21]  J.B. Allen,et al.  A unified approach to short-time Fourier analysis and synthesis , 1977, Proceedings of the IEEE.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Rodney W. Johnson Determining probability distributions by maximum entropy and minimum cross-entropy , 1979, APL.

[24]  Rodney W. Johnson Determining probability distributions by maximum entropy and minimum cross-entropy , 1979, APL '79.

[25]  Lalit R. Bahl,et al.  Further results on the recognition of a continuously read natural corpus , 1980, ICASSP.

[26]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[27]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[28]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[29]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[30]  James A. Cadzow ARMA Modeling of Time Series , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[32]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[33]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[34]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  T. Martin,et al.  On the effects of varying filter bank parameters on isolated word recognition , 1982 .

[36]  B.-H. Juang,et al.  On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.

[37]  L. R. Rabiner,et al.  On the application of energy contours to the recognition of connected word sequences , 1984, AT&T Bell Laboratories Technical Journal.

[38]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[39]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[40]  D. B. Paul Training of HMM recognizers by simulated annealing , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  K. H. Barratt Digital Coding of Waveforms , 1985 .

[42]  Jordan Cohen Application of an adaptive auditory model to speech recognition , 1985 .

[43]  Biing-Hwang Juang,et al.  Recent developments in the application of hidden Markov models to speaker-independent isolated word recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[45]  Oded Ghitza,et al.  Auditory nerve representation as a front-end for speech recognition in a noisy environment , 1986 .

[46]  A. Poritz,et al.  On hidden Markov models in isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Peter No,et al.  Digital Coding of Waveforms , 1986 .

[48]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[49]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[50]  George R. Doddington,et al.  Frame-specific statistical features for speaker independent speech recognition , 1986, IEEE Trans. Acoust. Speech Signal Process..

[51]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  S. Furui Speaker-Independent Isolated Word Recognition Based on Dynamics-Emphasized Cepstrum , 1986 .

[53]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[54]  Lalit R. Bahl,et al.  Experiments with the Tangora 20,000 word speech recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Anne-Marie Derouault,et al.  Context-dependent phonetic Markov models for large vocabulary speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Stephen E. Levinson,et al.  Continuous speech recognition by means of acoustic/ Phonetic classification obtained from a hidden Markov model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[58]  John Makhoul,et al.  BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[59]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[63]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[64]  Lalit R. Bahl,et al.  A new algorithm for the estimation of hidden Markov model parameters , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[65]  Frank K. Soong,et al.  A segment model based approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[66]  Yariv Ephraim,et al.  On the relations between modeling approaches for information sources (speech recognition) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[67]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[68]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[69]  Lawrence R. Rabiner,et al.  A minimum discrimination information approach for hidden Markov modeling , 1989, IEEE Trans. Inf. Theory.

[70]  Biing-Hwang Juang,et al.  HMM clustering for connected word recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[71]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[72]  Chin-Hui Lee,et al.  Word recognition using whole word and subword models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[73]  Xuedong Huang,et al.  Unified techniques for vector quantization and hidden Markov modeling using semi-continuous models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[74]  Yariv Ephraim,et al.  Estimation of hidden Markov model parameters by minimizing empirical error rate , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[75]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[76]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[77]  Neri Merhav,et al.  Maximum likelihood hidden Markov modeling using a dominant sequence of states , 1991, IEEE Trans. Signal Process..

[78]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..