The development of a speaker independent continuous speech recognizer for portuguese

The development and evaluation of large vocabulary, speaker-independent continuous speech recognition systems are mainly done for the American English language. In this paper we present the work done to date in the development of an hybrid large vocabulary, speaker-independent continuous speech recognition system for the European Portuguese language. Due to the lack of a large appropriate speech and text database to be used in the development of that system we started collecting a large database and at the same time began developing a baseline system based on a smaller database. On this baseline system we applied techniques for automatic segmentation and labeling, in parallel with the development of a basic lexicon and language model for Portuguese. In the last part of this paper we also present the rst steps of our work over the new database.