Audiovisual speech source separation: a regularization method based on visual voice activity detection

Audio-visual speech source separation consists in mixing visual speech processing techniques (e.g. lip parameters tracking) with source separation methods to improve and/or simplify the extraction of a speech signal from a mixture of acoustic signals. In this paper, we present a new approach to this problem: visual information is used here as a voice activity detector (VAD). Results show that, in the difficult case of realistic convolutive mixtures, the classic problem of the permutation of the output frequency channels can be solved using the visual information with a simpler processing than when using only audio information.

[1]  Christian Jutten,et al.  An Analysis of Visual Speech Information Applied to Voice Activity Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Dinh Tuan Pham,et al.  Joint Approximate Diagonalization of Positive Definite Hermitian Matrices , 2000, SIAM J. Matrix Anal. Appl..

[3]  Saeid Sanei,et al.  Video assisted speech source separation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[5]  T. Ens,et al.  Blind signal separation : statistical principles , 1998 .

[6]  Christian Jutten,et al.  Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Hiroshi Sawada,et al.  Frequency Domain Blind Source Separation for Many Speech Signals , 2004, ICA.

[8]  Dinh-Tuan Pham,et al.  A Novel Method for Permutation Correction in Frequency-Domain in Blind Separation of Speech Mixtures , 2004, ICA.

[9]  Christian Jutten,et al.  Developing an audio-visual speech source separation algorithm , 2004, Speech Commun..