Training mixture density HMMs with SOM and LVQ

The objective of this paper is to present experiments and discussions of how some neural network algorithms can help to improve phoneme recognition using mixture density hidden Markov models (MDHMMs). In MDHMMs, the modelling of the stochastic observation processes associated with the states is based on the estimation of the probability density function of the short-time observations in each state as a mixture of Gaussian densities. The Learning Vector Quantization (LVQ) is used to increase the discrimination between different phoneme models both during the initialization of the Gaussian codebooks and during the actual MDHMM training. The Self-Organizing Map (SOM) is applied to provide a suitably smoothed mapping of the training vectors to accelerate the convergence of the actual training. The codebook topology which is obtained can also be exploited in the recognition phase to speed up the calculations to approximate the observation probabilities. The experiments with LVQ and SOMs show reductions both in the average phoneme recognition error rate and in the computational load compared to the maximum likelihood training and the Generalized Probabilistic Descent (GPD). The lowest final error rate, however, is obtained by using several training algorithms successively. Additional reductions from the online system of about 40% in the error rate are obtained by using the same training methods, but with advanced and higher dimensional feature vectors.

[1]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[2]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[3]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[4]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[5]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[6]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[8]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[9]  Lalit R. Bahl,et al.  A new algorithm for the estimation of hidden Markov model parameters , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[11]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[12]  D. B. Paul,et al.  The Lincoln robust continuous speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[13]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[16]  Harvey F. Silverman,et al.  Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[17]  Kunio Nakajima,et al.  An optimal discriminative training method for continuous mixture density HMMs , 1990, ICSLP.

[18]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[19]  E. McDermott,et al.  A hybrid speech recognition system using HMMs with an LVQ-trained codebook , 1990 .

[20]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[22]  Mikko Kurimo,et al.  Status Report Of The Finnish Phonetic Typewriter Project , 1991 .

[23]  Z. Zhao,et al.  Application of Kohonen self-organising feature maps to smoothing parameters of hidden Markov models for speech recognition , 1991 .

[24]  T. Kohonen Workstation-based phonetic typewriter , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[25]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[26]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[27]  K. Torkkola,et al.  Training continuous density hidden Markov models in association with self-organizing maps and LVQ , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[28]  Mikko Kurimo,et al.  Combining LVQ with continuous-density hidden Markov models in speech recognition , 1992, Optics & Photonics.

[29]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  T. Kohonen,et al.  Appendix 2.4 Stopping Rule 2.3 Fine Tuning Using the Basic Lvq1 or Lvq2.1 Lvq Pak: a Program Package for the Correct Application of Learning Vector Quantization Algorithms , 1992 .

[31]  Teuvo Kohonen,et al.  LVQ-based speech recognition with high-dimensional context vectors , 1992, ICSLP.

[32]  Shigeru Katagiri,et al.  GPD training of dynamic programming-based speech recognizers , 1992 .

[33]  Elliot Singer,et al.  A speech recognizer using radial basis function neural networks in an HMM framework , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Mikko Kurimo,et al.  Using LVQ to enhance semi-continuous hidden Markov models for phonemes , 1993, EUROSPEECH.

[35]  Shigeru Katagiri,et al.  A new hybrid algorithm for speech recognition based on HMM segmentation and learning vector quantization , 1993, IEEE Trans. Speech Audio Process..

[36]  Mikko Kurimo,et al.  Hybrid training method for tied mixture density hidden Markov models using learning vector quantization and Viterbi estimation , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[37]  Antonio M. Peinado,et al.  Using multiple vector quantization and semicontinuous hidden Markov models for speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Mei-Yuh Hwang,et al.  Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Soo-Young Lee,et al.  Multi-dimentional HMM Parameter Estimation using Self-Organizing Feature Map for Speech Recognition , 1994 .

[40]  Vassilios Digalakis,et al.  Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Mikko Kurimo Corrective tuning by applying LVQ for continuous density and semi-continuous Markov models , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[42]  Sung-Bae Cho,et al.  An HMM/MLP Architecture for Sequence Recognition , 1995, Neural Computation.

[43]  Pedro L. Galindo A competitive algorithm for training HMM for speech recognition , 1995, EUROSPEECH.

[44]  George Zavaliagkos,et al.  Adaptation algorithms for large scale HMM recognizers , 1995, EUROSPEECH.

[45]  Pierre Baldi,et al.  Hybrid Modeling, HMM/NN Architectures, and Protein Applications , 1996, Neural Computation.

[46]  Mikko Kurimo,et al.  Using the self-organizing map to speed up the probability density estimation for speech recognition with mixture density HMMs , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[47]  Mikko Kurimo Segmental LVQ3 training for phoneme-wise tied mixture density HMMS , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[48]  Xiong Ya A NEW ALGORITHM FOR THE ESTIMATION OF PROJECTIVE INVARIANTS FROM UNCALIBRATED IMAGES , 1997 .

[49]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[50]  Thomas R. Rouse,et al.  “ A Segmental K-Means Training Procedure for Connected Word Recog nition , .