论文信息 - Statistical modeling of pronunciation and production variations for speech recognition

Statistical modeling of pronunciation and production variations for speech recognition

In this paper, we propose a procedure for training a pronunciation network with criteria consistent with the optimality objectives for speech recognition systems. In particular, we describe a framework for using maximum likelihood(ML) and minimum classi cation error(MCE) criteria for pronunciation network optimization. The ML criterion is used to obtain an optimal structure for the pronunciation network based on statistically-derived phonological rules. Discrimination among di erent pronunciation networks is achieved by weighting of the pronunciation networks, optimized by applying the MCE criterion. Experinent results demonstrate improvements in speech recognition accuracy after applying statistically derived phonological rules. It is shown that the impact of the pronunciation network weighting on the recognition performance is determined by the size of the recognition vocabulary.

Biing-Hwang Juang | Filipp Korkmazskiy

[1] Chin-Hui Lee,et al. Speech recognition using weighted HMM and subspace projection approaches , 1994, IEEE Trans. Speech Audio Process..

[2] Biing-Hwang Juang,et al. Discriminative training of the pronunciation networks , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[3] Andrej Ljolje,et al. Automatic Generation of Detailed Pronunciation Lexicons , 1996 .

[4] Torbjørn Svendsen,et al. Maximum likelihood modelling of pronunciation variation , 1999, Speech Commun..

[5] Aaron E. Rosenberg,et al. Word juncture modeling using phonological rules for HMM-based continuous speech recognition , 1991 .