Global optimization of a neural network-hidden Markov model hybrid

The integration of multilayered and recurrent artificial neural networks (ANNs) with hidden Markov models (HMMs) is addressed. ANNs are suitable for approximating functions that compute new acoustic parameters, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  G. Stewart Introduction to matrix computations , 1973 .

[3]  K. Stevens The Potential Role of Property Detectors in the Perception of Consonants , 1975 .

[4]  G. Fant,et al.  Auditory analysis and perception of speech , 1975 .

[5]  Stephen E. Levinson,et al.  A speaker-independent, syntax-directed, connected word recognition system based on hidden Markov models and level building , 1985, IEEE Trans. Acoust. Speech Signal Process..

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[8]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[9]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[10]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  Yoshua Bengio,et al.  Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge , 1989, NIPS.

[12]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Yoshua Bengio,et al.  Programmable execution of multi-layered networks for automatic speech recognition , 1989, CACM.

[15]  Jenq-Neng Hwang,et al.  A systolic neural network architecture for hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Steve Young Competitive training in hidden Markov models (speech recognition) , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[17]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[18]  S. Young Competitive training in hidden Markov models , 1990 .

[19]  Piero Cosi,et al.  Phonetically-based multi-layered neural networks for vowel classification , 1990, Speech Commun..

[20]  Françoise Fogelman-Soulié,et al.  Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping , 1990, Neural Networks.

[21]  Renato De Mori,et al.  A hybrid coder for hidden Markov models using a recurrent neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[22]  Stephen Cox,et al.  RecNorm: Simultaneous Normalisation and Classification Applied to Speech Recognition , 1990, NIPS.

[23]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[24]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[25]  Frank Fallside,et al.  Phoneme Recognition from the TIMIT database using Recurrent Error Propa-gation Networks , 1990 .

[26]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[27]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[29]  Yoshua Bengio,et al.  Artificial neural networks and their application to sequence recognition , 1991 .

[30]  Yoshua Bengio,et al.  Phonetically motivated acoustic parameters for continuous speech recognition using artificial neural networks , 1991, Speech Commun..