Statistical segmentation and word modeling techniques in isolated word recognition

A speech recognition system is described using a combination of statistical segment and word modeling. Segment models are constructed by first segmenting training data automatically and then grouping the resultant segments into clusters. Mixtures of Gaussian densities are used to model each segment cluster. In order to integrate the segment models into word models, a generalization of the hidden Markov model approach is proposed. Experimental results on a multispeaker recognition system for alpha-digits demonstrate that the new approach improved the performance of conventional whole-word-based models. In particular, the word models show good discrimination abilities for differentiating phonetically similar words such as the E-set alphabet.<<ETX>>