Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction

This paper proposes a statistical model-based speech dereverberation approach that can cancel the late reverberation of a reverberant speech signal captured by distant microphones without prior knowledge of the room impulse responses. With this approach, the generative model of the captured signal is composed of a source process, which is assumed to be a Gaussian process with a time-varying variance, and an observation process modeled by a delayed linear prediction (DLP). The optimization objective for the dereverberation problem is derived to be the sum of the squared prediction errors normalized by the source variances; hence, this approach is referred to as variance-normalized delayed linear prediction (NDLP). Inheriting the characteristic of DLP, NDLP can robustly estimate an inverse system for late reverberation in the presence of noise without greatly distorting a direct speech signal. In addition, owing to the use of variance normalization, NDLP allows us to improve the dereverberation result especially with relatively short (of the order of a few seconds) observations. Furthermore, NDLP can be implemented in a computationally efficient manner in the time-frequency domain. Experimental results demonstrate the effectiveness and efficiency of the proposed approach in comparison with two existing approaches.

[1]  Scott C. Douglas,et al.  Convolutive blind separation of speech mixtures using the natural gradient , 2003, Speech Commun..

[2]  Dirk T. M. Slock,et al.  Blind fractionally-spaced equalization, perfect-reconstruction filter banks and multichannel linear prediction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  Marc Moonen,et al.  Subspace Methods for Multimicrophone Speech Dereverberation , 2003, EURASIP J. Adv. Signal Process..

[6]  Takuya Yoshioka,et al.  Integrated Speech Enhancement Method Using Noise Suppression and Dereverberation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Masato Miyoshi ESTIMATING AR PARAMETER-SETS FOR LINEAR-RECURRENT SIGNALS IN CONVOLUTIVE MIXTURES , 2003 .

[8]  Tomohiro Nakatani,et al.  An integrated method for blind separation and dereverberation of convolutive audio mixtures , 2008, 2008 16th European Signal Processing Conference.

[9]  Tomohiro Nakatani,et al.  Adaptive dereverberation of speech signals with speaker-position change detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Biing-Hwang Juang,et al.  INCREMENTAL ESTIMATION OF REVERBERATIONWITH UNCERTAINTY USING PRIOR KNOWLEDGE OF ROOM ACOUSTICS FOR SPEECH DEREVERBERATION , 2008 .

[11]  Klaus Uwe Simmer,et al.  Superdirective Microphone Arrays , 2001, Microphone Arrays.

[12]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Tomohiro Nakatani,et al.  Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Stephan Weiss,et al.  Fast implementation of oversampled modulated filter banks , 2000 .

[15]  Hirokazu Kameoka,et al.  Statistical models for speech dereverberation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[16]  Takuya Yoshioka,et al.  Dereverberation by Using Time-Variant Nature of Speech Production System , 2007, EURASIP J. Adv. Signal Process..

[17]  Sadaoki Furui,et al.  Digital Speech Processing, Synthesis, and Recognition , 1989 .

[18]  Emanuel A. P. Habets,et al.  Multi-channel speech dereverberation based on a statistical model of late reverberation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19]  Biing-Hwang Juang,et al.  Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Masashi Unoki,et al.  A method based on the MTF concept for dereverberating the power envelope from the reverberant signal , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Dirk T. M. Slock,et al.  Delay and Predict Equalization for Blind Speech Dereverberation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Chrysostomos L. Nikias,et al.  EVAM: an eigenvector-based algorithm for multichannel blind deconvolution of input colored signals , 1995, IEEE Trans. Signal Process..

[23]  Philippe Loubaton,et al.  Prediction error method for second-order blind identification , 1997, IEEE Trans. Signal Process..

[24]  David Gesbert,et al.  Robust blind channel identification and equalization based on multi-step predictors , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Patrick A. Naylor,et al.  Speech Dereverberation , 2010 .

[26]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[27]  Marc Delcroix,et al.  Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations , 2007, EURASIP J. Adv. Signal Process..

[28]  Ken'ichi Furuya,et al.  Robust Speech Dereverberation Using Multichannel Blind Deconvolution With Spectral Subtraction , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Jacob Benesty,et al.  Speech Acquisition and Enhancement in a Reverberant, Cocktail-Party-Like Environment , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  Gary W. Elko,et al.  Superdirectional microphone arrays , 2000 .