Behavioral / Systems / Cognitive Sparse Representations for the Cocktail Party Problem

A striking feature of many sensory processing problems is that there appear to be many more neurons engaged in the internal representations of the signal than in its transduction. For example, humans have 30,000 cochlear neurons, but at least 1000 times as many neurons in the auditory cortex. Such apparently redundant internal representations have sometimes been proposed as necessary to overcome neuronal noise. We instead posit that they directly subserve computations of interest. Here we provide an example of how sparse overcomplete linear representations can directly solve difficult acoustic signal processing problems, using as an example monaural source separation using solely the cues provided by the differential filtering imposed on a source by its path from its origin to the cochlea [the head-related transfer function (HRTF)]. In contrast to much previous work, the HRTF is used here to separate auditory streams rather than to localize them in space. The experimentally testable predictions that arise from this model, including a novel method for estimating the optimal stimulus of a neuron using data from a multineuron recording experiment, are generic and apply to a wide range of sensory computations.

[1]  Anat Levin,et al.  User Assisted Separation of Reflections from a Single Image Using a Sparsity Prior , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Barak A. Pearlmutter,et al.  Sparse Representations for the Cocktail Party Problem , 2006, The Journal of Neuroscience.

[3]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[4]  T. Hromádka,et al.  Reliability and Representational Bandwidth in the Auditory Cortex , 2005, Neuron.

[5]  J. Rauschecker,et al.  Perceptual Organization of Tone Sequences in the Auditory Cortex of Awake Macaques , 2005, Neuron.

[6]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[7]  J. Arezzo,et al.  Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. , 2004, The Journal of the Acoustical Society of America.

[8]  J. Gallant,et al.  Natural Stimulus Statistics Alter the Receptive Field Structure of V1 Neurons , 2004, The Journal of Neuroscience.

[9]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[10]  Yuanqing Li,et al.  Analysis of Sparse Representation and Blind Source Separation , 2004, Neural Computation.

[11]  Christian K. Machens,et al.  Linearity of Cortical Receptive Fields Measured with Natural Sounds , 2004, The Journal of Neuroscience.

[12]  Jonathan Z. Simon,et al.  Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.

[13]  Masakazu Konishi,et al.  Mechanisms of sound localization in the barn owl (Tyto alba) , 1979, Journal of comparative physiology.

[14]  Christoph E Schreiner,et al.  Spectrotemporal structure of receptive fields in areas AI and AAF of mouse auditory cortex. , 2003, Journal of neurophysiology.

[15]  Terrence J Sejnowski,et al.  Communication in Neuronal Networks , 2003, Science.

[16]  M. DeWeese,et al.  Binary Spiking in Auditory Cortex , 2003, The Journal of Neuroscience.

[17]  Xiaoqin Wang,et al.  Auditory Cortical Responses Elicited in Awake Primates by Random Spectrum Stimuli , 2003, The Journal of Neuroscience.

[18]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[19]  Konrad P. Körding,et al.  Sparse Spectrotemporal Coding of Sounds , 2003, EURASIP J. Adv. Signal Process..

[20]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[21]  Bruno A. Olshausen,et al.  A new window on sound , 2002, Nature Neuroscience.

[22]  Paul M. Hofman,et al.  Bayesian reconstruction of sound localization cues from responses to random spectra , 2002, Biological Cybernetics.

[23]  R. Linsker Separation of a mixture of acoustic sources into its components , 2002 .

[24]  K. D. Punta,et al.  An ultra-sparse code underlies the generation of neural sequences in a songbird , 2002 .

[25]  D. Donoho,et al.  Maximal Sparsity Representation via l 1 Minimization , 2002 .

[26]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[27]  Eero P. Simoncelli,et al.  Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[28]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[29]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[30]  J. Gallant,et al.  Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. , 2001, Network.

[31]  Michael C. Mozer,et al.  Monaural Separation and Classification of Mixed Signals : a Support-vector Regression Perspective , 2001 .

[32]  Tomaso Poggio,et al.  Models of object recognition , 2000, Nature Neuroscience.

[33]  S. Rickard,et al.  DOA estimation of many W-disjoint orthogonal sources from two mixtures using DUET , 2000, Proceedings of the Tenth IEEE Workshop on Statistical Signal and Array Processing (Cat. No.00TH8496).

[34]  M. Sutter Shapes and level tolerances of frequency tuning curves in primary auditory cortex: quantitative measures and population codes. , 2000, Journal of neurophysiology.

[35]  K. Sen,et al.  Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[36]  J L Gallant,et al.  Sparse coding and decorrelation in primary visual cortex during natural vision. , 2000, Science.

[37]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[38]  Kazuya Takeda,et al.  Estimating Head Related Transfer Function Using Multiple Regression Analysis , 2000 .

[39]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[40]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[41]  H Farid,et al.  Separating reflections from images by use of independent component analysis. , 1999, Journal of the Optical Society of America. A, Optics, image science, and vision.

[42]  Gert Cauwenberghs,et al.  Monaural separation of independent acoustical components , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[43]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[44]  Israel Nelken,et al.  Responses of auditory-cortex neurons to structural features of natural sounds , 1999, Nature.

[45]  H. Steven Colburn,et al.  Role of spectral detail in sound-source localization , 1998, Nature.

[46]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[47]  Shun-ichi Amari,et al.  Adaptive blind signal processing-neural network approaches , 1998, Proc. IEEE.

[48]  M. Merzenich,et al.  Optimizing sound features for cortical neurons. , 1998, Science.

[49]  L. Abbott,et al.  Responses of neurons in primary and inferior temporal visual cortices to natural scenes , 1997, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[50]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[51]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[52]  I. Nelken Demonstrations of Auditory Scene Analysis: The Perceptual Organization of Sound by Albert S. Bregman and Pierre A. Ahad, MIT Press, 1996. £15.95 CD , 1997, Trends in Neurosciences.

[53]  Hagai Attias,et al.  Temporal Low-Order Statistics of Natural Sounds , 1996, NIPS.

[54]  S. Shamma,et al.  Analysis of dynamic spectra in ferret primary auditory cortex. II. Prediction of unit responses to arbitrary dynamic spectra. , 1996, Journal of neurophysiology.

[55]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[56]  William B. Levy,et al.  Energy Efficient Neural Codes , 1996, Neural Computation.

[57]  S. Sheft,et al.  A simulated “cocktail party” with up to three sound sources , 1996, Perception & psychophysics.

[58]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[59]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[60]  F L Wightman,et al.  Localization using nonindividualized head-related transfer functions. , 1993, The Journal of the Acoustical Society of America.

[61]  Pierre Comon,et al.  Blind separation of sources, part II: Problems statement , 1991, Signal Process..

[62]  William Bialek,et al.  Reading a Neural Code , 1991, NIPS.

[63]  F L Wightman,et al.  Headphone simulation of free-field listening. II: Psychophysical validation. , 1989, The Journal of the Acoustical Society of America.

[64]  Tomaso Poggio,et al.  Computational vision and regularization theory , 1985, Nature.

[65]  R. Fletcher Semi-Definite Matrix Constraints in Optimization , 1985 .