论文信息 - On the use of residual cepstrum in speech recognition

On the use of residual cepstrum in speech recognition

In speech recognition based on LPC analysis the prediction residues are usually ignored, only the LPC-derived cepstral coefficients (LPCC) are used to compose feature vectors. In this study, a number of parameters (called the residual cepstrum or RCEP) were calculated from these residues and their effectiveness for speech recognition was evaluated. It was shown that the RCEP do contain useful information, in particular, they are complementary to the LPCC. In an evaluation experiment, if the LPCC were used jointly with a few RCEP coefficients, the recognition rate of the English E-set letters was improved from 54% to 67% and from 69% to 71% by the use of HMMs based recognizer and the DTW based recognizer, respectively. In addition, Mel-scaled FFT based cepstrum (MFCC) was found to be superior to LPCC.

Günther Palm | Jialong He | Li Liu

[1] Ron Cole,et al. The ISOLET spoken letter database , 1990 .

[2] Biing-Hwang Juang,et al. The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3] Heinz Hügli,et al. Usefulness of the LPC-residue in text-independent speaker verification , 1995, Speech Commun..

[4] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5] S. Furui,et al. Cepstral analysis technique for automatic speaker verification , 1981 .

[6] Günther Palm,et al. On the use of features from prediction residual signals in speaker identification , 1995, EUROSPEECH.

[7] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[8] Joseph Picone,et al. Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[9] Climent Nadeu,et al. On the AR modelling of the one-sided autocorrelation sequence for noisy speech recognition , 1992, ICSLP.

[10] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[11] Aaron E. Rosenberg,et al. On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.