论文信息 - An application of discriminative feature extraction to filter-bank-based speech recognition

An application of discriminative feature extraction to filter-bank-based speech recognition

A pattern recognizer is usually a modular system which consists of a feature extractor module and a classifier module. Traditionally, these two modules have been designed separately, which may not result in an optimal recognition accuracy. To alleviate this fundamental problem, the authors have developed a design method, named discriminative feature extraction (DFE), that enables one to design the overall recognizer, i.e., both the feature extractor and the classifier, in a manner consistent with the objective of minimizing recognition errors. This paper investigates the application of this method to designing a speech recognizer that consists of a filter-hank feature extractor and a multi-prototype distance classifier. Carefully investigated experiments demonstrate that DFE achieves the design of a better recognizer and provides an innovative recognition-oriented analysis of the filter-bank, as an alternative to conventional analysis based on psychoacoustic expertise or heuristics.

[1] Antonio M. Peinado,et al. An application of minimum classification error to feature space transformations for speech recognition , 1996, Speech Commun..

[2] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3] Alain Biem,et al. Pattern recognition using discriminative feature extraction , 1997, IEEE Trans. Signal Process..

[4] Biing-Hwang Juang,et al. New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[5] Shigeru Katagiri,et al. A generalized probabilistic descent method , 1990 .

[6] Shigeru Katagiri,et al. A Telephone-based recognition system adaptively trained using Minimum Classification Error/Generalized Probablistic Descent(MCE/GPD) , 1995 .

[7] Shigeru Katagiri,et al. Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[8] Biing-Hwang Juang,et al. The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[9] Biing-Hwang Juang,et al. Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[10] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[11] Hervé Bourlard,et al. Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12] Nathan Intrator,et al. A Neural Network for Feature Extraction , 1989, NIPS.

[13] Alain Biem,et al. Feature extraction based on minimum classification error/generalized probabilistic descent method , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] Biing-Hwang Juang,et al. Discriminative feature extraction for speech recognition , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[15] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16] David G. Lowe,et al. Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[17] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .

[18] Nathan Intrator,et al. Feature Extraction Using an Unsupervised Neural Network , 1992, Neural Computation.

[19] Shigeru Katagiri,et al. Discriminative metric design for robust pattern recognition , 1997, IEEE Trans. Signal Process..

[20] Shun-ichi Amari,et al. A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[21] Li Deng,et al. HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[22] Alain Biem,et al. Cepstrum-based filter-bank design using discriminative feature extraction training at various levels , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23] E. Zwicker,et al. Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .