Multimodal Soft Nonnegative Matrix Co-Factorization for Convolutive Source Separation

In this paper, the problem of convolutive source separation via multimodal soft Nonnegative Matrix Co-Factorization (NMCF) is addressed. Different aspects of a phenomenon may be recorded by sensors of different types (e.g., audio and video of human speech), and each of these recorded signals is called a modality. Since the underlying phenomenon of the modalities is the same, they have some similarities. Especially, they usually have similar time changes. It means that changes in one of them usually correspond to changes in the other one. So their active or inactive periods are usually similar. Assuming this similarity, it is expected that the activation coefficient matrices of their Nonnegative Matrix Factorization (NMF) have a similar form. In this paper, the similarity of the activation coefficient matrices between the modalities is considered for co-factorization. This similarity is used for separation procedure in a soft manner by using penalty terms. This results in more flexibility in the separation procedure. Simulation results and comparison with state-of-the-art algorithms show the effectiveness of the proposed algorithm.

[1]  Cédric Févotte,et al.  Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[4]  Christian Jutten,et al.  Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Jonathon A. Chambers,et al.  Audiovisual Speech Source Separation: An overview of key methodologies , 2014, IEEE Signal Processing Magazine.

[6]  Christian Jutten,et al.  Challenges in multimodal data fusion , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[7]  Tamara G. Kolda,et al.  All-at-once Optimization for Coupled Matrix and Tensor Factorizations , 2011, ArXiv.

[8]  Christian Jutten,et al.  A study of lip movements during spontaneous dialog and its application to voice activity detection. , 2009, The Journal of the Acoustical Society of America.

[9]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[10]  Luis Castedo,et al.  SEPARATION OF CONVOLUTIVE MIXTURES OF TEMPORALLY-WHITE SIGNALS: A NOVEL FREQUENCY-DOMAIN APPROACH , 2001 .

[11]  Olivier Cappé,et al.  Soft nonnegative matrix co-factorizationwith application to multimodal speaker diarization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Pierre Comon,et al.  Multimodal approach to estimate the ocular movements during EEG recordings: A coupled tensor factorization method , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[13]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[14]  Olivier Cappé,et al.  Soft Nonnegative Matrix Co-Factorization , 2014, IEEE Transactions on Signal Processing.

[15]  Jean-Louis Lacoume,et al.  Blind separation of wide-band sources in the frequency domain , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Lucas C. Parra,et al.  A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS , 2007 .

[17]  Minje Kim,et al.  Nonnegative Matrix Partial Co-Factorization for Spectral and Temporal Drum Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[18]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..