Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A time-varying Gaussian source model (TVGSM) is introduced as a model that represents the dynamic short time characteristics of nonreverberant speech segments, including the time and frequency structures of the speech spectrum. With this model, dereverberation of the speech signal is formulated as a maximum-likelihood (ML) problem based on multichannel linear prediction, in which the speech signal is recovered by transforming the observed signal into one that is probabilistically more like nonreverberant speech. We first present a general ML solution based on TVGSM, and derive several dereverberation algorithms based on various source models. Specifically, we present a source model consisting of a finite number of states, each of which is manifested by a short time speech spectrum, defined by a corresponding autocorrelation (AC) vector. The dereverberation algorithm based on this model involves a finite collection of spectral patterns that form a codebook. We confirm experimentally that both the time and frequency characteristics represented in the source models are very important for speech dereverberation, and that the prior knowledge represented by the codebook allows us to further improve the dereverberated speech quality. We also confirm that the quality of reverberant speech signals can be greatly improved in terms of the spectral shape and energy time-pattern distortions from simply a short speech signal using a speaker-independent codebook.

[1]  Bayya Yegnanarayana,et al.  Enhancement of reverberant speech using LP residual signal , 2000, IEEE Trans. Speech Audio Process..

[2]  Philippe Loubaton,et al.  Prediction error method for second-order blind identification , 1997, IEEE Trans. Signal Process..

[3]  Biing-Hwang Juang,et al.  Robust blind dereverberation of speech signals based on characteristics of short-time speech segments , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[4]  Masato Miyoshi ESTIMATING AR PARAMETER-SETS FOR LINEAR-RECURRENT SIGNALS IN CONVOLUTIVE MIXTURES , 2003 .

[5]  Yunxin Zhao,et al.  An EM algorithm for linear distortion channel estimation based on observations from a mixture of Gaussian sources , 1999, IEEE Trans. Speech Audio Process..

[6]  Chrysostomos L. Nikias,et al.  EVAM: an eigenvector-based algorithm for multichannel blind deconvolution of input colored signals , 1995, IEEE Trans. Signal Process..

[7]  Hagai Attias,et al.  An EM Method for Spatio-temporal Blind Source Separation Using an AR-MOG Source Model , 2006, ICA.

[8]  Dirk T. M. Slock,et al.  Multivariate LP Based MMSE-ZF Equalizer Design Considerations and Application to Multimicrophone Dereverberation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[10]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[11]  Dirk T. M. Slock,et al.  Blind fractionally-spaced equalization, perfect-reconstruction filter banks and multichannel linear prediction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Tomohiro Nakatani,et al.  Maximum likelihood approach to speech enhancement for noisy reverberant signals , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Andrew R. Barron,et al.  Mixture Density Estimation , 1999, NIPS.

[14]  J. Cardoso,et al.  Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[15]  Marc Delcroix,et al.  Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations , 2007, EURASIP J. Adv. Signal Process..

[16]  Jacob Benesty,et al.  Speech Acquisition and Enhancement in a Reverberant, Cocktail-Party-Like Environment , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[18]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[19]  Klaus Uwe Simmer,et al.  Superdirective Microphone Arrays , 2001, Microphone Arrays.

[20]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[21]  Biing-Hwang Juang,et al.  Study on Speech Dereverberation with Autocorrelation Codebook , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Tomohiro Nakatani,et al.  Importance of Energy and Spectral Features in Gaussian Source Model for Speech Dereverberation , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[23]  Patrick A. Naylor,et al.  Speech Dereverberation , 2010 .

[24]  Marc Moonen,et al.  Subspace Methods for Multimicrophone Speech Dereverberation , 2003, EURASIP J. Adv. Signal Process..

[25]  Tomohiro Nakatani,et al.  Spectral Subtraction Steered by Multi-Step Forward Linear Prediction For Single Channel Speech Dereverberation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[26]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Probabilistic Models of Source and Room Acoustics , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[28]  Biing-Hwang Juang,et al.  Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Gary W. Elko,et al.  Superdirectional microphone arrays , 2000 .

[30]  Marc Delcroix,et al.  On robust inverse filter design for room transfer function fluctuations , 2006, 2006 14th European Signal Processing Conference.

[31]  Takuya Yoshioka,et al.  Dereverberation by Using Time-Variant Nature of Speech Production System , 2007, EURASIP J. Adv. Signal Process..

[32]  Tomohiro Nakatani,et al.  Harmonicity-Based Blind Dereverberation for Single-Channel Speech Signals , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Sadaoki Furui,et al.  Digital Speech Processing, Synthesis, and Recognition , 1989 .

[34]  Marc Delcroix,et al.  Precise Dereverberation Using Multichannel Linear Prediction , 2007, IEEE Transactions on Audio, Speech, and Language Processing.