An application of discriminative feature extraction to filter-bank-based speech recognition

A pattern recognizer is usually a modular system which consists of a feature extractor module and a classifier module. Traditionally, these two modules have been designed separately, which may not result in an optimal recognition accuracy. To alleviate this fundamental problem, the authors have developed a design method, named discriminative feature extraction (DFE), that enables one to design the overall recognizer, i.e., both the feature extractor and the classifier, in a manner consistent with the objective of minimizing recognition errors. This paper investigates the application of this method to designing a speech recognizer that consists of a filter-hank feature extractor and a multi-prototype distance classifier. Carefully investigated experiments demonstrate that DFE achieves the design of a better recognizer and provides an innovative recognition-oriented analysis of the filter-bank, as an alternative to conventional analysis based on psychoacoustic expertise or heuristics.

[1]  Antonio M. Peinado,et al.  An application of minimum classification error to feature space transformations for speech recognition , 1996, Speech Commun..

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  Alain Biem,et al.  Pattern recognition using discriminative feature extraction , 1997, IEEE Trans. Signal Process..

[4]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[5]  Shigeru Katagiri,et al.  A generalized probabilistic descent method , 1990 .

[6]  Shigeru Katagiri,et al.  A Telephone-based recognition system adaptively trained using Minimum Classification Error/Generalized Probablistic Descent(MCE/GPD) , 1995 .

[7]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[8]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[9]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Nathan Intrator,et al.  A Neural Network for Feature Extraction , 1989, NIPS.

[13]  Alain Biem,et al.  Feature extraction based on minimum classification error/generalized probabilistic descent method , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Biing-Hwang Juang,et al.  Discriminative feature extraction for speech recognition , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[15]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16]  David G. Lowe,et al.  Optimized Feature Extraction and the Bayes Decision in Feed-Forward Classifier Networks , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[18]  Nathan Intrator,et al.  Feature Extraction Using an Unsupervised Neural Network , 1992, Neural Computation.

[19]  Shigeru Katagiri,et al.  Discriminative metric design for robust pattern recognition , 1997, IEEE Trans. Signal Process..

[20]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[21]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[22]  Alain Biem,et al.  Cepstrum-based filter-bank design using discriminative feature extraction training at various levels , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .