A Continuous Speech Recognition System Embedding MLP into HMM

We are developing a phoneme based, speaker-dependent continuous speech recognition system embedding a Multilayer Perceptron (MLP) (i.e., a feedforward Artificial Neural Network), into a Hidden Markov Model (HMM) approach. In [Bourlard & Wellekens], it was shown that MLPs were approximating Maximum a Posteriori (MAP) probabilities and could thus be embedded as an emission probability estimator in HMMs. By using contextual information from a sliding window on the input frames, we have been able to improve frame or phoneme classification performance over the corresponding performance for Simple Maximum Likelihood (ML) or even MAP probabilities that are estimated without the benefit of context. However, recognition of words in continuous speech was not so simply improved by the use of an MLP, and several modifications of the original scheme were necessary for getting acceptable performance. It is shown here that word recognition performance for a simple discrete density HMM system appears to be somewhat better when MLP methods are used to estimate the emission probabilities.

[1]  M. Stone Cross-validation:a review 2 , 1978 .

[2]  Shozo Makino,et al.  Recognition of consonant based on the perceptron model , 1983, ICASSP.

[3]  Nelson Morgan,et al.  "Ignorance-based" systems , 1984, ICASSP.

[4]  Roger K. Moore,et al.  Experiments in Isolated Digit Recognition Using the Multi-Layer Perceptron, , 1987 .

[5]  Xavier L. Aubert Supervised segmentation with application to speech recognition , 1987, ECST.

[6]  Lokendra Shastri,et al.  Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[7]  Hermann Ney,et al.  Phoneme modelling using continuous mixture densities , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[10]  Hervé Bourlard,et al.  Statistical Inference in Multilayer Perceptrons and Hidden Markov Models with Applications in Continuous Speech Recognition , 1989, NATO Neurocomputing.

[11]  M. A. Bush,et al.  How limited training data can allow a neural network to outperform an 'optimal' statistical classifier , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..