Signal modeling for speaker identification

A large number of parameters, including pitch, LPCC, /spl Delta/LPCC, PARCOR, MFCC, /spl Delta/MFCC, and residual cepstrum (RCEP) were extracted from speech signals and their effectiveness for text-independent speaker identification was evaluated. In addition, the usefulness of two signal processing techniques, preemphasis and cepstral weighting, was also studied. The VQ-based speaker recognition method with codebooks fine-tuned by LVQ algorithm was used. It was shown that both LPCC and MFCC are effective representations, for smaller number of parameters, LPCC representation performs better but is surpassed by MFCC if the analysis order is larger. Pitch is an independent parameter so that it can be used jointly with other spectral features. In an evaluation experiment, the correct identification rate for 112 male speakers with test utterances of less than one second reached 98.2%.

[1]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  B Gold,et al.  Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[4]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[5]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[7]  Heinz Hügli,et al.  Usefulness of the LPC-residue in text-independent speaker verification , 1995, Speech Commun..

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  Aaron E. Rosenberg,et al.  Evaluation of a vector quantization talker recognition system in text independent and text dependent modes , 1987 .

[10]  Günther Palm,et al.  A text-independent speaker identification system based on neural networks , 1994, ICSLP.

[11]  Günther Palm,et al.  On the use of features from prediction residual signals in speaker identification , 1995, EUROSPEECH.