Discriminative feature and model design for automatic speech recognition

AUTOMATIC SPEECH RECOGNITION Mazin Rahim, Yoshua Bengio and Yann LeCun AT&T Labs Research, 600 Mountain Avenue, Murray Hill, New Jersey 07974, USA ABSTRACT A system for discriminative feature and model design is presented for automatic speech recognition. Training based on minimum classi cation error with a single objective function is applied for designing a set of parallel networks performing feature transformation and a set of hidden Markov models performing speech recognition. This paper compares the use of linear and non-linear functional transformations when applied to conventional recognition features, such as spectrum or cepstrum. It also provides a framework for integrated feature and model training when using class-speci c transformations. Experimental results on telephone-based connected digit recognition are presented.

[1]  A. Rosenberg,et al.  A coarticulation model for continuous digit recognition , 1990 .

[2]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[3]  B. Widrow,et al.  Adaptive inverse control , 1987, Proceedings of 8th IEEE International Symposium on Intelligent Control.

[4]  Stephan Euler,et al.  Integrated optimization of feature transformation for speech recognition , 1995, EUROSPEECH.

[5]  Chin-Hui Lee,et al.  Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..