A Contrast Function Based on Generalized Divergences for Solving the Permutation Problem in Convolved Speech Mixtures

In this paper, we propose a method for solving the permutation problem that is inherent in the separation of convolved mixtures of speech signals in the time-frequency domain. The proposed method obtains the solution through maximization of a contrast function that exploits the similarity of the temporal envelope of the speech spectrum. For this purpose, the contrast calculation uses a global measure of similarity based on the recently developed family of generalized Alpha-Beta divergences, which depend on two tuning parameters, alpha and beta. This parameterization is exploited to best measure the similarity of the speech spectrum and to obtain solutions that are robust against noise and outliers. The ability of this contrast function to solve the permutation problem is supported by a theoretical study that shows that for a simple time-frequency speech model, the contrast value reaches its maximum when the estimated components are properly aligned. Several performance studies demonstrate that the proposed method maintains a high level of permutation correction accuracy in a wide variety of acoustic environments. Moreover, it produces better results than other state-of-the-art methods for solving permutations in highly reverberant environments.

[1]  Allan Kardec Barros,et al.  Independent Component Analysis and Blind Source Separation , 2007, Signal Processing.

[2]  Lucas C. Parra,et al.  A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS , 2007 .

[3]  Dennis R. Morgan,et al.  Exploring permutation inconsistency in blind separation of speech signals in a reverberant environment , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[5]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[6]  Jesper Jensen,et al.  Log-spectral magnitude MMSE estimators under super-Gaussian densities , 2009, INTERSPEECH.

[7]  Nobuhiko Kitawaki,et al.  Combined approach of array processing and independent component analysis for blind separation of acoustic signals , 2003, IEEE Trans. Speech Audio Process..

[8]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[9]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[10]  Guo Wei,et al.  Convolutive Blind Source Separation of Non-stationary Source , 2011 .

[11]  Nikos D. Sidiropoulos,et al.  Batch and Adaptive PARAFAC-Based Blind Separation of Convolutive Speech Mixtures , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  B. Kollmeier,et al.  Convolutive blind source separation of speech signals based on amplitude modulation decorrelation , 2000 .

[13]  Hiroshi Sawada,et al.  Measuring Dependence of Bin-wise Separated Signals for Permutation Alignment in Frequency-domain BSS , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[14]  Radoslaw Mazur,et al.  An Approach for Solving the Permutation Problem of Convolutive Blind Source Separation Based on Statistical Signal Models , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Dorothea Kolossa,et al.  Using information theoretic distance measures for solving the permutation problem of blind source separation of speech signals , 2012, EURASIP J. Audio Speech Music. Process..

[16]  Iván Durán-Díaz,et al.  Initialization method for speech separation algorithms that work in the time-frequency domain. , 2010, The Journal of the Acoustical Society of America.

[17]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[18]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[19]  Nikolaos Mitianoudis,et al.  Audio source separation of convolutive mixtures , 2003, IEEE Trans. Speech Audio Process..

[20]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[21]  Te-Won Lee,et al.  Independent Vector Analysis: Definition and Algorithms , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[22]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[23]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[24]  Birger Kollmeier,et al.  Amplitude Modulation Decorrelation For Convolutive Blind Source Separation , 2000 .

[25]  Sergio Cruces,et al.  A Study of Methods for Initialization and Permutation Alignment for Time-Frequency Domain Blind Source Separation , 2012 .

[26]  K. Matsuoka,et al.  Minimal distortion principle for blind source separation , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[27]  Zbynek Koldovský,et al.  Optimal pairing of signal components separated by blind techniques , 2004, IEEE Signal Processing Letters.

[28]  Andreas Ziehe,et al.  An approach to blind source separation based on temporal structure of speech signals , 2001, Neurocomputing.

[29]  Christine Serviere,et al.  BLIND SEPARATION OF CONVOLUTIVE AUDIO MIXTURES USING NONSTATIONARITY , 2003 .

[30]  Ted S. Wada,et al.  Coherent spectral estimation for a robust solution of the permutation problem , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[31]  Hiroshi Sawada,et al.  SPECTRAL SMOOTHING FOR FREQUENCY-DOMAIN BLIND SOURCE SEPARATION , 2003 .

[32]  Heping Ding,et al.  A Region-Growing Permutation Alignment Approach in Frequency-Domain Blind Source Separation of Speech Mixtures , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Iván Durán-Díaz,et al.  Generalized Method for Solving the Permutation Problem in Frequency-Domain Blind Source Separation of Convolved Speech Signals , 2011, INTERSPEECH.