论文信息 - Efficient Dynamic WFST Decoding for Personalized Language Models

Efficient Dynamic WFST Decoding for Personalized Language Models

We propose a two-layer cache mechanism to speed up dynamic WFST decoding with personalized language models. The first layer is a public cache that stores most of the static part of the graph. This is shared globally among all users. A second layer is a private cache that caches the graph that represents the personalized language model, which is only shared by the utterances from a particular user. We also propose two simple yet effective pre-initialization methods, one based on breadth-first search, and another based on a data-driven exploration of decoder states using previous utterances. Experiments with a calling speech recognition task using a personalized contact list demonstrate that the proposed public cache reduces decoding time by factor of three compared to decoding without pre-initialization. Using the private cache provides additional efficiency gains, reducing the decoding time by a factor of five.

Jun Liu | Fuchun Peng | Vishal Kathuria | Jiedan Zhu

[1] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[2] Hideki Kashioka,et al. A Specialized WFST Approach for Class Models and Dynamic Vocabulary , 2012, INTERSPEECH.

[3] Mehryar Mohri,et al. Speech Recognition with Weighted Finite-State Transducers , 2008 .

[4] Johan Schalkwyk,et al. OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[5] Johan Schalkwyk,et al. A generalized composition algorithm for weighted finite-state transducers , 2009, INTERSPEECH.

[6] Keikichi Hirose,et al. Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition , 2012, INTERSPEECH.

[7] Ian McGraw,et al. Personalized speech recognition on mobile devices , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Sadaoki Furui,et al. Implementation and evaluation of fast on-the-fly WFST composition algorithms , 2008, INTERSPEECH.

[9] Cyril Allauzen,et al. Pre-initialized composition for large-vocabulary speech recognition , 2013, INTERSPEECH.

[10] Cyril Allauzen,et al. Improved recognition of contact names in voice commands , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).