A generalized construction of integrated speech recognition transducers

We showed in previous work that weighted finite-state transducers provide a common representation for many components of a speech recognition system and described general algorithms for combining these representations to build a single optimized and compact transducer integrating all these components, directly mapping from HMM states to words. This approach works well for certain well-controlled input transducers, but presents some problems related to the efficiency of composition and the applicability of determinization and weight-pushing with more general transducers. We generalize our prior construction of the integrated speech recognition transducer to work with an arbitrary number of component transducers and, to a large extent, release the constraints imposed on the type of input transducers by providing more general solutions to these problems. This generalization allowed us to deal with cases where our prior optimization did not apply. Our experiments in the AT&T HMIHY 0300 task and an AT&T VoiceTone task show the efficiency of our generalized optimization technique. We report a 1.6 recognition speed-up in the HMIHY 0300 task, 1.8 speed-up in a VoiceTone task using a word-based language model, and 1.7 using a class-based model.

[1]  Arto Salomaa,et al.  Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[2]  Cyril Allauzen,et al.  Generalized optimization algorithm for speech recognition transducers , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Arto Salomaa,et al.  Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[4]  Ronald Rosenfeld,et al.  Scalable backoff language models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[6]  Brian Roark,et al.  Generalized Algorithms for Constructing Statistical Language Models , 2003, ACL.

[7]  Mehryar Mohri,et al.  Integrated context-dependent networks in very large vocabulary speech recognition , 1999, EUROSPEECH.

[8]  Mehryar Mohri,et al.  A weight pushing algorithm for large vocabulary speech recognition , 2001, INTERSPEECH.

[9]  Jochen Peters LM Studies on Filled Pauses in Spontaneous Medical Dictation , 2003, HLT-NAACL.

[10]  Mehryar Mohri Edit-Distance Of Weighted Automata: General Definitions And Algorithms , 2003, Int. J. Found. Comput. Sci..

[11]  Mehryar Mohri,et al.  Generic e-Removal and Input e-Normalization Algorithms for Weighted Transducers , 2002, Int. J. Found. Comput. Sci..