Visual Based Reference for Enhanced Audio-Video Source Extraction

This paper addresses the problem of source extraction in a complex scene where only moving audio sources are present. An algorithm using a unique yet simple method avoiding higher-order statistics has been developed. The principle idea of the algorithm is to use a video camera array for locating a moving source whose position is used to isolate a noise reference, and thus allowing noise subtraction from the mixture based on the widely-known Widrow adaptive filtering method, that only uses second-order statistics. This adaptive approach provides an alternative to traditional methods particularly when there is need for a real time implementation.

[1]  Yonggang Zhang,et al.  Multimodal blind source separation for moving sources , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Christian Jutten,et al.  Visual voice activity detection as a help for speech source separation from convolutive mixtures , 2007, Speech Commun..

[3]  Junfeng Li,et al.  Two-stage binaural speech enhancement with wiener filter based on equalization-cancellation model , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[5]  Jean-Marc Odobez,et al.  Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  B. Widrow,et al.  Adaptive filtering in the frequency domain , 1978, Proceedings of the IEEE.

[7]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[8]  A. Van Hirtum,et al.  Insulation room for aero-acoustic experiments at moderate Reynolds and low Mach numbers , 2012 .

[9]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[10]  Miao Yu,et al.  A Multimodal Approach to Blind Source Separation of Moving Sources , 2010, IEEE Journal of Selected Topics in Signal Processing.