When silence is gold

Audiovisual source separation is a fascinating approach to source extraction. Several algorithms have already been proposed for extracting speech sources from audio mixtures by exploiting audiovisual coherence. One of the main property of speech signals is that they are highly non-stationary: there are periods during which speakers do not produce sounds. In this work, the audiovisual coherence is used to estimate such silent periods which are then useful to extract corresponding speech signals.

[1]  Hiroshi Sawada,et al.  Blind Speech Separation in a Meeting Situation with Maximum SNR Beamformers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[3]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[4]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[5]  J L Schwartz,et al.  Audio-visual enhancement of speech in noise. , 2001, The Journal of the Acoustical Society of America.

[6]  T. Ens,et al.  Blind signal separation : statistical principles , 1998 .

[7]  Thomas S. Huang,et al.  Bayesian separation of audio-visual speech sources , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Christian Jutten,et al.  A study of lip movements during spontaneous dialog and its application to voice activity detection. , 2009, The Journal of the Acoustical Society of America.

[9]  Jeanny Hérault,et al.  Motion processing in the retina: about a velocity matched filter , 1993, ESANN.

[10]  Lynne E. Bernstein,et al.  Auditory speech detection in noise enhanced by lipreading , 2004, Speech Commun..

[11]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[12]  Chalapathy Neti,et al.  Joint audio-visual speech processing for recognition and enhancement , 2003, AVSP.

[13]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[14]  Christian Jutten,et al.  Developing an audio-visual speech source separation algorithm , 2004, Speech Commun..

[15]  Gene H. Golub,et al.  Matrix computations , 1983 .

[16]  Christian Jutten,et al.  Visual voice activity detection as a help for speech source separation from convolutive mixtures , 2007, Speech Commun..

[17]  N. P. Erber Interaction of audition and vision in the recognition of oral speech stimuli. , 1969, Journal of speech and hearing research.

[18]  Jeesun Kim,et al.  Investigating the audio-visual speech detection advantage , 2004, Speech Commun..

[19]  Rémi Gribonval,et al.  A survey of Sparse Component Analysis for blind source separation: principles, perspectives, and new challenges , 2006, ESANN.

[20]  Saeid Sanei,et al.  Video assisted speech source separation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[22]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[23]  Christian Jutten,et al.  Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Richard M. Dansereau,et al.  Co-channel audiovisual speech separation using spectral matching constraints , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Christian Jutten,et al.  Blind Extraction of Intermittent Sources , 2010, LVA/ICA.

[26]  Miao Yu,et al.  A Multimodal Approach to Blind Source Separation of Moving Sources , 2010, IEEE Journal of Selected Topics in Signal Processing.

[27]  C. Benoît,et al.  Effects of phonetic context on audio-visual intelligibility of French. , 1994, Journal of speech and hearing research.

[28]  Christian Jutten,et al.  Two novel visual voice activity detectors based on appearance models and retinal filtering , 2007, 2007 15th European Signal Processing Conference.

[29]  C. Sigg,et al.  Nonnegative CCA for Audiovisual Source Separation , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[30]  Christian Jutten,et al.  Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli , 2002, EURASIP J. Adv. Signal Process..

[31]  Pierre Vandergheynst,et al.  Blind Audiovisual Source Separation Based on Sparse Redundant Representations , 2010, IEEE Transactions on Multimedia.

[32]  Philip J. B. Jackson,et al.  Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS , 2010, LVA/ICA.

[33]  Yannick Deville,et al.  Temporal and time-frequency correlation-based blind source separation methods. Part I: Determined and underdetermined linear instantaneous mixtures , 2007, Signal Process..

[34]  P F Seitz,et al.  The use of visible speech cues for improving auditory detection of spoken sentences. , 2000, The Journal of the Acoustical Society of America.