Monaural Source Separation Using Spectral Cues

The acoustic environment poses at least two important challenges. First, animals must localise sound sources using a variety of binaural and monaural cues; and second they must separate sources into distinct auditory streams (the “cocktail party problem”). Binaural cues include intra-aural intensity and phase disparity. The primary monaural cue is the spectral filtering introduced by the head and pinnae via the head-related transfer function (HRTF), which imposes different linear filters upon sources arising at different spatial locations.

[1]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[2]  Eric I. Knudsen,et al.  Incremental training increases the plasticity of the auditory space map in adult barn owls , 2002, Nature.

[3]  Bruno A. Olshausen,et al.  A new window on sound , 2002, Nature Neuroscience.

[4]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[5]  Masakazu Konishi,et al.  Mechanisms of sound localization in the barn owl (Tyto alba) , 1979, Journal of comparative physiology.

[6]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[7]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[9]  Paul M. Hofman,et al.  Relearning sound localization with new ears , 1998, Nature Neuroscience.

[10]  S. Rickard,et al.  DOA estimation of many W-disjoint orthogonal sources from two mixtures using DUET , 2000, Proceedings of the Tenth IEEE Workshop on Statistical Signal and Array Processing (Cat. No.00TH8496).

[11]  D. Donoho,et al.  Maximal Sparsity Representation via l 1 Minimization , 2002 .

[12]  F L Wightman,et al.  Headphone simulation of free-field listening. II: Psychophysical validation. , 1989, The Journal of the Acoustical Society of America.

[13]  Tomaso Poggio,et al.  Computational vision and regularization theory , 1985, Nature.

[14]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[15]  Te-Won Lee,et al.  A Maximum Likelihood Approach to Single-channel Source Separation , 2003, J. Mach. Learn. Res..

[16]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[17]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[18]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[19]  Paul M. Hofman,et al.  Bayesian reconstruction of sound localization cues from responses to random spectra , 2002, Biological Cybernetics.

[20]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[21]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[22]  A J King,et al.  Plasticity in the neural coding of auditory space in the mammalian brain. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  H. Steven Colburn,et al.  Role of spectral detail in sound-source localization , 1998, Nature.

[24]  Gert Cauwenberghs,et al.  Monaural separation of independent acoustical components , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[25]  B. Shinn-Cunningham Models of Plasticity in Spatial Auditory Processing , 2001, Audiology and Neurotology.

[26]  F L Wightman,et al.  Localization using nonindividualized head-related transfer functions. , 1993, The Journal of the Acoustical Society of America.

[27]  S. Sheft,et al.  A simulated “cocktail party” with up to three sound sources , 1996, Perception & psychophysics.

[28]  R. Fletcher Semi-Definite Matrix Constraints in Optimization , 1985 .

[29]  Tomaso Poggio,et al.  Models of object recognition , 2000, Nature Neuroscience.

[30]  Michael C. Mozer,et al.  Monaural Separation and Classification of Mixed Signals : a Support-vector Regression Perspective , 2001 .

[31]  Bruno A. Olshausen,et al.  Inferring Sparse, Overcomplete Image Codes Using an Efficient Coding Framework , 1998, NIPS.