Connectionist Viterbi training: a new hybrid method for continuous speech recognition

A hybrid method for continuous-speech recognition which combines hidden Markov models (HMMs) and a connectionist technique called connectionist Viterbi training (CVT) is presented. CVT can be run iteratively and can be applied to large-vocabulary recognition tasks. Successful completion of training the connectionist component of the system, despite the large network size and volume of training data, depends largely on several measures taken to reduce learning time. The system is trained and tested on the TI/NBS speaker-independent continuous-digits database. Performance on test data for unknown-length strings is 98.5% word accuracy and 95.0% string accuracy. Several improvements to the current system are expected to increase these accuracies significantly.<<ETX>>

[1]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[4]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[5]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[6]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[8]  Michael Witbrock,et al.  A connectionist approach to continuous speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Joseph Picone On modeling duration in context in speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Richard Lippmann,et al.  HMM Speech Recognition with Neural Net Discrimination , 1989, NIPS.

[11]  George R. Doddington Phonetically sensitive discriminants for improved speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..