A large vocabulary continuous speech recognition hybrid system for the portuguese language

Due to the enormous development of large vocabulary, speaker-independent continuous speech recognition systems, which occur essentially for the US English language, there is a large demand of this kind of systems for other languages. In this paper we present the work done in the development of a large vocabulary, speaker-independent continuous speech recognition hybrid system for the European Portuguese language. This is a difficult task due to the basic development stage of this technology in the European Portuguese language. The development of a system of this kind for a new language depends on the availability of the appropriate source components, mainly a speech corpus and large amounts of texts. This work became possible due to the development of a new database (BD-PUBLICO), a large vocabulary speech corpus for the European Portuguese language developed by us over the last two years.