Speech recognition with continuous-parameter hidden Markov models

Abstract The acoustic modeling problem in automatic speech recognition is examined from an information-theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is broken down into two steps: a signal-processing step which converts a speech waveform into a sequence of information-bearing acoustic feature vectors, and a step which models such a sequence. We are primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space. We explore the trade-off between packing information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous-parameter sequences is addressed by investigating a method of parameter estimation which is designed to cope with inaccurate modeling assumptions.

[1]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[2]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speaker independent isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  A. Nadas,et al.  A decision theorectic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood , 1983 .

[4]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[5]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[6]  L. Rabiner,et al.  The acoustics, speech, and signal processing society - A historical perspective , 1984, IEEE ASSP Magazine.

[7]  A. Poritz,et al.  On hidden Markov models in isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[9]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[11]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[12]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[13]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[14]  Lalit R. Bahl,et al.  Continuous speech recognition with automatically selected acoustic prototypes obtained by either bootstrapping or clustering , 1981, ICASSP.

[15]  Lalit R. Bahl,et al.  Continuous parameter acoustic processing for recognition of a natural speech corpus , 1981, ICASSP.

[16]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.