论文信息 - Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation

Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation

This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the log-spectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computationally intractable problem. We derive three approximations to enhance the efficiency of signal estimation. The Gaussian approximation transforms the log-spectral domain GMM into the frequency domain using minimal Kullback-Leiber (KL)-divergency criterion. The frequency domain Laplace method computes the maximum a posteriori (MAP) estimator for the spectral amplitude. Correspondingly, the log-spectral domain Laplace method computes the MAP estimator for the log-spectral amplitude. Further, the gain and noise spectrum adaptation are implemented using the expectation-maximization (EM) algorithm within the GMM under Gaussian approximation. The proposed algorithms are evaluated by applying them to enhance the speeches corrupted by the speech-shaped noise (SSN). The experimental results demonstrate that the proposed algorithms offer improved signal-to-noise ratio, lower word recognition error rate, and less spectral distortion.

Terrence J. Sejnowski | Jiucang Hao | Te-Won Lee | Hagai Attias | Srikantan S. Nagarajan

[1] I. Cohen,et al. Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[2] Ho-Young Jung,et al. Speech Coding And Noise Reduction Using Ica-Based Speech Features , 2000 .

[3] Rainer Martin,et al. Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[4] Rainer Martin,et al. MMSE estimation of magnitude-squared DFT coefficients with superGaussian priors , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5] Jae S. Lim,et al. The unimportance of phase in speech enhancement , 1982 .

[6] Yariv Ephraim,et al. Recent Advancements in Speech Enhancement , 2004 .

[7] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[8] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[9] John R. Hershey,et al. Monaural speech separation and recognition challenge , 2010, Comput. Speech Lang..

[10] Ross D. Shachter,et al. Laplace's Method Approximations for Probabilistic Inference in Belief Networks with Continuous Variables , 1994, UAI.

[11] CohenIsrael,et al. An integrated real-time beamforming and postfiltering system for nonstationary noise environments , 2003 .

[12] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[13] Christophe Beaugeant,et al. SPEECH ENHANCEMENT USING A MINIMUM LEAST SQUARE AMPLITUDE ESTIMATOR , 2001 .

[14] Simon J. Godsill,et al. Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15] Michael S. Brandstein. On the use of explicit speech modeling in microphone array applications , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16] Wai-Kai Chen,et al. The Electrical Engineering Handbook , 2004 .

[17] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[18] T. Lotter. NOISE REDUCTION BY MAXIMUM A POSTERIORI SPECTRAL AMPLITUDE ESTIMATION WITH SUPERGAUSSIAN SPEECH MODELING , 2003 .

[19] Yariv Ephraim. Gain-adapted hidden Markov models for recognition of clean and noisy speech , 1992, IEEE Trans. Signal Process..

[20] Ehud Weinstein,et al. Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[21] Li Deng,et al. Speech Denoising and Dereverberation Using Probabilistic Models , 2000, NIPS.

[22] R. Balan,et al. Independent component analysis based single channel speech enhancement , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[23] Sharon Gannot,et al. Speech enhancement using a mixture-maximum model , 2002, IEEE Trans. Speech Audio Process..

[24] T. Kristjansson,et al. High resolution signal reconstruction , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[25] Brendan J. Frey,et al. Learning Dynamic Noise Models from Noisy Speech for Robust Speech Recognition , 2001 .

[26] Yariv Ephraim,et al. Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[27] Hagai Attias,et al. A Variational Bayesian Framework for Graphical Models , 1999 .

[28] Li Deng,et al. A new method for speech denoising and robust speech recognition using probabilistic models for clean speech and for noise , 2001, INTERSPEECH.

[29] Yariv Ephraim,et al. A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[30] Israel Cohen,et al. Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[31] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[32] Jacob Benesty,et al. Study of the Wiener Filter for Noise Reduction , 2005 .

[33] Rainer Martin,et al. Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[34] R. McAulay,et al. Speech enhancement using a soft-decision noise suppression filter , 1980 .

[35] Peter J. W. Rayner,et al. Single channel nonstationary stochastic signal separation using linear time-varying filters , 2003, IEEE Trans. Signal Process..

[36] Yariv Ephraim,et al. A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[37] A. Czyzewski,et al. Noise reduction in audio signals based on the perceptual coding approach , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).