On the application of hidden Markov models for enhancing noisy speech

A maximum-a-posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed. The approach is based on statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes. Hidden Markov models (HMMs) with mixtures of Gaussian autoregressive (AR) output probability distributions (PDs) are used to model the clean speech signal. The model for the noise process depends on its nature. The parameter set of the HMM model is estimated using the Baum or the EM (estimation-maximization) algorithm. The noisy speech is enhanced by reestimating the clean speech waveform using the EM algorithm. Efficient approximations of the training and enhancement procedures are examined. This results in the segmental k-means approach for hidden Markov modeling, in which the state sequence and the parameter set of the model are alternately estimated. Similarly, the enhancement is done by alternate estimation of the state and observation sequences. An approximate improvement of 4.0-6.0 dB in signal-to-noise ratio (SNR) is achieved at 10-dB input SNR. >

[1]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[2]  Ronald E. Crochiere,et al.  Frequency domain coding of speech , 1979 .

[3]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[4]  A Nguyen On the uniqueness of the maximum-likeliwood estimate of structured covariance matrices , 1984 .

[5]  Robert M. Gray,et al.  Rate-distortion speech coding with a minimum discrimination information distortion measure , 1981, IEEE Trans. Inf. Theory.

[6]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[7]  Robert M. Gray,et al.  Global convergence and empirical consistency of the generalized Lloyd algorithm , 1986, IEEE Trans. Inf. Theory.

[8]  Robert M. Gray,et al.  Toeplitz And Circulant Matrices , 1977 .

[9]  R. Gallager Information Theory and Reliable Communication , 1968 .

[10]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[11]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[12]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[13]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[15]  Robert M. Gray,et al.  A unified approach for encoding clean and noisy sources by means of waveform and autoregressive model vector quantization , 1988, IEEE Trans. Inf. Theory.

[16]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[17]  Louis L. Scharf,et al.  Modulo-2 Pi phase sequence estimation (Corresp.) , 1980, IEEE Trans. Inf. Theory.

[18]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[19]  Ronald E. Crochiere,et al.  A weighted overlap-add method of short-time Fourier analysis/Synthesis , 1980 .

[20]  R. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[21]  Amir Dembo,et al.  The relation between maximum likelihood estimation of structured covariance matrices and periodograms , 1986, IEEE Trans. Acoust. Speech Signal Process..

[22]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[23]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[24]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[25]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[28]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[29]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.