Two multimodal approaches for single microphone source separation

In this paper, the problem of single microphone source separation via Nonnegative Matrix Factorization (NMF) by exploiting video information is addressed. Respective audio and video modalities coming from a single human speech usually have similar time changes. It means that changes in one of them usually corresponds to changes in the other one. So it is expected that activation coefficient matrices of their NMF decomposition are similar. Based on this similarity, in this paper the activation coefficient matrix of the video modality is used as an initialization for audio source separation via NMF. In addition, the mentioned similarity is used for post-processing and for clustering the rows of the activation coefficient matrix which were resulted from randomly initialized NMF. Simulation results confirm the effectiveness of the proposed multimodal approaches in single microphone source separation.

[1]  G.-J. Jang,et al.  Single-channel signal separation using time-domain basis functions , 2003, IEEE Signal Processing Letters.

[2]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[4]  Christian Jutten,et al.  Challenges in multimodal data fusion , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[5]  Christian Jutten,et al.  Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Dalia El Badawy,et al.  On-the-fly audio source separation , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[8]  Seungjin Choi,et al.  A Method of Initialization for Nonnegative Matrix Factorization , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Partha Pratim Kanjilal,et al.  Analysis and characterization of photo-plethysmographic signal , 2001, IEEE Transactions on Biomedical Engineering.

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Christian Jutten,et al.  A study of lip movements during spontaneous dialog and its application to voice activity detection. , 2009, The Journal of the Acoustical Society of America.

[12]  Louis Chevallier,et al.  Temporal annotation-based audio source separation using weighted nonnegative matrix factorization , 2014, 2014 IEEE Fourth International Conference on Consumer Electronics Berlin (ICCE-Berlin).

[13]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[14]  Olivier Cappé,et al.  Soft nonnegative matrix co-factorizationwith application to multimodal speaker diarization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Cédric Févotte,et al.  Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  C. D. Meyer,et al.  Initializations for the Nonnegative Matrix Factorization , 2006 .

[17]  Christian Jutten,et al.  Visual voice activity detection as a help for speech source separation from convolutive mixtures , 2007, Speech Commun..

[18]  24th European Signal Processing Conference, EUSIPCO 2016, Budapest, Hungary, August 29 - September 2, 2016 , 2016, European Signal Processing Conference.

[19]  Paris Smaragdis,et al.  Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Sabine Van Huffel,et al.  Source Separation From Single-Channel Recordings by Combining Empirical-Mode Decomposition and Independent Component Analysis , 2010, IEEE Transactions on Biomedical Engineering.

[21]  Mark D. Plumbley,et al.  INVESTIGATING SINGLE-CHANNEL AUDIO SOURCE SEPARATION METHODS BASED ON NON-NEGATIVE MATRIX FACTORIZATION , 2006 .