Audiovisual Speech Separation Based on Independent Vector Analysis Using a Visual Voice Activity Detector

In this paper, we present a way of improving the Independent Vector Analysis in the context of blind separation of convolutive mixtures of speech signals. The periods of activity and inactivity of one or more speech signals are first detected using a binary visual voice activity detector based on lip movements and then fed into a modified Independent Vector Analysis algorithm to achieve the separation. Presented results show that this approach improves separation and identification of sources in a determined case with a higher convergence rate, and is also able to enhance a specific source in an underdetermined mixture.

[1]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[2]  Matthew Anderson,et al.  Independent vector analysis: Theory, algorithms, and applications , 2013 .

[3]  Q Summerfield,et al.  Use of Visual Information for Phonetic Perception , 1979, Phonetica.

[4]  Te-Won Lee,et al.  Independent Vector Analysis: An Extension of ICA to Multivariate Components , 2006, ICA.

[5]  Jonathon A. Chambers,et al.  Audiovisual Speech Source Separation: An overview of key methodologies , 2014, IEEE Signal Processing Magazine.

[6]  Gaojie Chen,et al.  Independent vector analysis with multivariate student's t-distribution source prior for speech separation , 2013 .

[7]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[8]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[9]  Lucas C. Parra,et al.  A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS , 2007 .

[10]  Patrick A. Naylor,et al.  EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[11]  Christian Jutten,et al.  An Analysis of Visual Speech Information Applied to Voice Activity Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Atsuo Hiroe,et al.  Solution of Permutation Problem in Frequency Domain ICA, Using Multivariate Probability Density Functions , 2006, ICA.

[13]  Lucas C. Parra,et al.  Convolutive Blind Source Separation Methods , 2008 .

[14]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.