Blind One-microphone Speech Separation: A Spectral Learning Approach

We present an algorithm to perform blind, one-microphone speech separation. Our algorithm separates mixtures of speech without modeling individual speakers. Instead, we formulate the problem of speech separation as a problem in segmenting the spectrogram of the signal into two or more disjoint sets. We build feature sets for our segmenter using classical cues from speech psychophysics. We then combine these features into parameterized affinity matrices. We also take advantage of the fact that we can generate training examples for segmentation by artificially superposing separately-recorded signals. Thus the parameters of the affinity matrices can be tuned using recent work on learning spectral clustering [1]. This yields an adaptive, speech-specific segmentation algorithm that can successfully separate one-microphone speech mixtures.

[1]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[3]  Jitendra Malik,et al.  Spectral Partitioning with Indefinite Kernels Using the Nyström Extension , 2002, ECCV.

[4]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[5]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[6]  Jitendra Malik,et al.  Spectral Partitioning with Inde nite Kernels using the Nystr om Extension , 2002 .

[7]  Barak A. Pearlmutter,et al.  Blind Source Separation via Multinode Sparse Representation , 2001, NIPS.

[8]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[9]  Daniel P. W. Ellis,et al.  Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition , 1999 .

[10]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[11]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  S. Mallat A wavelet tour of signal processing , 1998 .

[13]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[14]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[15]  G. Wahba Spline models for observational data , 1990 .

[16]  Michael I. Jordan,et al.  Discriminative training of hidden Markov models for multiple pitch tracking [speech processing examples] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[18]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.