Generalized mixture of HMMs for continuous speech recognition

This paper presents a new technique for modeling heterogeneous data sources such as speech signals received via distinctly different channels. Such a scenario arises when an automatic speech recognition system is deployed in wireless telephony in which highly heterogeneous channels coexist and interoperate. The problem is that a simple model may become inadequate to describe accurately the diversity of the signal, resulting in an unsatisfactory recognition performance. To deal with such a problem, we propose a generalized mixture model (GMM) approach. For speech signals, in particular, we use mixtures of hidden Markov models (i.e., GMHMM, generalized mixture of HMMs). By applying discriminative training for GMHMM we obtained 1.0% word error rate for the recognition of the digits strings from the wireless database, comparing to 1.4% word error rate for the conventional HMM based discriminative technique.