A family of distortion measures base upon projection operation for robust speech recognition

The authors aim at the formulation of similarity measures for robust speech recognition. Their consideration focuses on the speech cepstrum derived from linear prediction coefficients (the LPC cepstrum). By using common models for noisy speech, they analytically and empirically show how the ambient noise can affect some important attributes of the LPC cepstrum such as the vector norm, coefficient order, and the direction perturbation. The new findings led them to propose a family of distortion measures based on the projection between two cepstral vectors. Performance evaluation of these measures has been conducted in both speaker-dependent and speaker-independent isolated word recognition tasks. Experimental results show that the new measures cause no degradation in recognition accuracy at high SNR, but perform significantly better when tested under noisy conditions using only clean reference templates. At an SNR of 5 dB, the new measures are shown to be able to achieve a recognition rate equivalent to that obtained by the filtered cepstral measure at 20 dB SNR, demonstrating a gain of 15 dB.<<ETX>>

[1]  Kuldip K. Paliwal,et al.  On the performance of the quefrency-weighted cepstral coefficients in vowel recognition , 1982, Speech Commun..

[2]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[3]  R. Hellman Asymmetry of masking between noise and tone , 1972 .

[4]  L. Rabiner,et al.  Isolated and Connected Word Recognition - Theory and Selected Applications , 1981, IEEE Transactions on Communications.

[5]  J. Makhoul,et al.  Linear Prediction and the Spectral Analysis of Speech , 1972 .

[6]  Lawrence R. Rabiner,et al.  A modified K-means clustering algorithm for use in isolated work recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[7]  James A. Cadzow ARMA Modeling of Time Series , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Brian A. Hanson,et al.  Spectral slope distance measures with linear prediction analysis for word recognition in noise , 1987, IEEE Trans. Acoust. Speech Signal Process..

[9]  Yariv Ephraim,et al.  A linear predictive front-end processor for speech recognition in noisy environments , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  D. Mansour,et al.  The short-time modified coherence representation and its application for noisy speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.