A family of distortion measures based upon projection operation for robust speech recognition

Consideration is given to the formulation of speech similarity measures, a fundamental component in recognizer designs, that are robust to the change of ambient conditions. The authors focus on the speech cepstrum derived from linear prediction coefficients (the LPC cepstrum). By using some common models for noisy speech, they show analytically that additive white noise reduces the norm (length) of the LPC cepstral vectors. Empirical observations on the parameter histograms not only confirm the analytical results through the use of noise models but further reveal that at a given (global) signal-to-noise ratio (SNR), the norm reduction on cepstral vectors with larger norms is generally less than on vectors with smaller norms, and that lower order coefficients are more affected than higher order terms. In addition, it is found that the orientation (or direction) of the cepstral vector is less susceptible to noise perturbation than the vector norm. As a consequence of the above results, a family of distortion measures based on the projection between two cepstral vectors is proposed. The new measures have the same computational efficiency as the band-pass cepstral distortion measure. >

[1]  J. Makhoul,et al.  Linear Prediction and the Spectral Analysis of Speech , 1972 .

[2]  R. Hellman Asymmetry of masking between noise and tone , 1972 .

[3]  S. Kay Noise compensation for autoregressive spectral estimates , 1980 .

[4]  L. Rabiner,et al.  Isolated and Connected Word Recognition - Theory and Selected Applications , 1981, IEEE Transactions on Communications.

[5]  Kuldip K. Paliwal,et al.  EVALUATION OF VARIOUS LINEAR PREDICTION PARAMETRIC REPRESENTATIONS IN VOWEL RECOGNITION , 1982 .

[6]  James A. Cadzow ARMA Modeling of Time Series , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Kuldip K. Paliwal,et al.  On the performance of the quefrency-weighted cepstral coefficients in vowel recognition , 1982, Speech Commun..

[8]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[9]  Man Mohan Sondhi,et al.  A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Brian A. Hanson,et al.  Spectral slope distance measures with linear prediction analysis for word recognition in noise , 1987, IEEE Trans. Acoust. Speech Signal Process..

[11]  Yariv Ephraim,et al.  A linear predictive front-end processor for speech recognition in noisy environments , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  D. Mansour,et al.  The short-time modified coherence representation and its application for noisy speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..