Exploiting the harmonic structure for speech enhancement

We provide a single channel speech enhancement method leveraging the harmonic structure of voiced speech. A sinusoidal model, based on the pitch of the speaker, is used to filter noisy speech and remove any noise components that lie between the harmonics. To remove noise that lie on each harmonic frequency, we use a noise estimation procedure that exploits spectral sparsity of voiced speech. By measuring the power spectrum at frequencies that correspond to the zero crossings of the windowing function, we can estimate the noise levels even in frames that have voiced speech. We also provide a constrained linear least squares formulation to reduce “musical noise” which arises from difficulty in estimating speech and noise power spectral densities. We show that our method yields high perceptual performance over existing methods, and can easily adapt to conditions in which the noise characteristics are constantly changing.

[1]  Jon Barker,et al.  A pitch based noise estimation technique for robust speech recognition with Missing Data , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[4]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[5]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[6]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[7]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[8]  K. U. Simmer,et al.  Multi-microphone noise reduction techniques as front-end devices for speech recognition , 2000, Speech Commun..

[9]  Xin Liu,et al.  Speech Enhancement Using Harmonic Emphasis and Adaptive Comb Filtering , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[11]  Hanseok Ko,et al.  A novel spectral subtraction scheme for robust speech recognition: spectral subtraction using spectral harmonics of speech , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[12]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..