Hidden Markov models: a guided tour

Hidden Markov modeling is a probabilistic technique for the study of time series. Hidden Markov theory permits modeling with any of the classical probability distributions. The costs of implementation are linear in the length of data. Models can be nested to reflect hierarchical sources of knowledge. These and other desirable features have made hidden Markov methods increasingly attractive for problems in language, speech and signal processing. The basic ideas are introduced by elementary examples in the spirit of the Polya urn models. The main tool in hidden Markov modeling is the Baum-Welch (or forward-backward) algorithm for maximum likelihood estimation of the model parameters. This iterative algorithm is discussed both from an intuitive point of view as an exercise in the art of counting and from a formal point of view via the information-theoretic Q-function. Selected examples drawn from the literature illustrate how the Baum-Welch technique places a rich variety of computational models at the disposal of the researcher.<<ETX>>

[1]  Claude E. Shannon,et al.  A Mathematical Theory of Communications , 1948 .

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[4]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[5]  R. L. Stratonovich CONDITIONAL MARKOV PROCESSES , 1960 .

[6]  Patrick Billingsley,et al.  Statistical inference for Markov processes , 1961 .

[7]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[8]  R. Bellman Dynamic programming. , 1957, Science.

[9]  R. Chang,et al.  On receiver structures for channels having memory , 1966, IEEE Trans. Inf. Theory.

[10]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[11]  M. I. Shlezinger,et al.  The interaction of learning and self-organization in pattern recognition , 1968 .

[12]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[13]  T. Petrie Probabilistic functions of finite-state markov chains. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[14]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[15]  E. Neuburg Markov Models for Phonetic Text , 1971 .

[16]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[17]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[18]  John Cocke,et al.  Optimal decoding of linear codes for minimizing symbol error rate (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[19]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[20]  James K. Baker,et al.  Stochastic modeling as a means of automatic speech recognition. , 1975 .

[21]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[22]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .

[23]  N. A. Esin,et al.  Synthesis of a probabilistic finite-state grammar describing a given set of sequences , 1977, CYBERNETICS.

[24]  J Ott,et al.  Counting methods (EM algorithm) in human pedigree analysis: Linkage and segregation analysis , 1977, Annals of human genetics.

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  A. House,et al.  Toward automatic identification of the language of an utterance. I. Preliminary methodological con , 1977 .

[27]  J. Baker Trainable grammars for speech recognition , 1979 .

[28]  J. Shore Minimum cross-entropy spectral analysis , 1981 .

[29]  Robert M. Gray,et al.  Rate-distortion speech coding with a minimum discrimination information distortion measure , 1981, IEEE Trans. Inf. Theory.

[30]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[31]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[32]  James C. Spohrer,et al.  Partial traceback and dynamic programming , 1982, ICASSP.

[33]  Bruce R. Musicus,et al.  Iterative algorithms for optimal signal reconstruction and parameter identification given noisy and incomplete data , 1983, ICASSP.

[34]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[35]  A. Nádas Hidden Markov chains, the forward-backward algorithm, and initial statistics , 1983 .

[36]  A. Nadas,et al.  A decision theorectic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood , 1983 .

[37]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[38]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  A. Nadas,et al.  Estimation of probabilities in the language model of the IBM speech recognition system , 1984 .

[40]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[41]  Thomas M. Cover,et al.  An algorithm for maximizing expected log investment return , 1984, IEEE Trans. Inf. Theory.

[42]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[43]  L. Shepp,et al.  A Statistical Model for Positron Emission Tomography , 1985 .

[44]  S.E. Levinson,et al.  Structural methods in automatic speech recognition , 1985, Proceedings of the IEEE.

[45]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  A. Poritz,et al.  On hidden Markov models in isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[48]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[49]  Lalit R. Bahl,et al.  Experiments with the Tangora 20,000 word speech recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  H. Landau Maximum entropy and the moment problem , 1987 .

[51]  Stephen E. Levinson,et al.  Continuous speech recognition by means of acoustic/ Phonetic classification obtained from a hidden Markov model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  Alan V. Oppenheim,et al.  Methods for noise cancellation based on the EM algorithm , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[54]  John Makhoul,et al.  BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987 .

[56]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[57]  Lawrence R. Rabiner,et al.  A minimum discrimination information approach for hidden Markov modeling , 1989, IEEE Trans. Inf. Theory.