Using partial morphological analysis in language modeling estimation for large vocabulary portuguese speech recognition

To achieve an acceptable degree of generalization, current speech recognition systems work with large vocabularies, which, among other e ects, result in higher search spaces and consequently lower system performance. For highly in ectional languages, such as the Portuguese, a much larger vocabulary is required for the same tasks coverage and a much larger text corpus for extraction of word-based statistics with the same reliability. In this paper we present a new approach using some basic morphological analysis based on the decomposition of regular verbs on its morphemes (roots and su xes) applied to a Portuguese large vocabulary continuous speech recognition system. This approach not only reduces the vocabulary size and therefore the language model perplexity, but also the rate of out-of-vocabulary words (OOV) and memory requirements. Preliminary results shows an improvement of about 20% on the recognition speed with a slight degradation on the word error rate (WER).

[1]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[2]  Manny Rayner,et al.  Handling compound nouns in a Swedish speech-understanding system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Steven John Whittaker,et al.  Issues in large-vocabulary interactive speech systems , 1996 .

[4]  Ciro Martins,et al.  The design of a large vocabulary speech corpus for portuguese , 1997, EUROSPEECH.

[5]  Marcus Spies,et al.  A language model for compound words in speech recognition , 1995, EUROSPEECH.

[6]  Petra Geutner,et al.  Using morphology towards better large-vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  André Berton,et al.  Compound words in large-vocabulary German speech recognition systems , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Ciro Martins,et al.  A large vocabulary continuous speech recognition hybrid system for the portuguese language , 1998, ICSLP.

[9]  Philip C. Woodland,et al.  Comparison of language modelling techniques for Russian and English , 1998, ICSLP.