Mapping content dependent acoustic information into context independent form by LVQ

Abstract In the framework of phonemic speech recognition using Hidden Markov Models (HMMs) together with codebooks trained by Learning Vector Quantization (LVQ), a novel way to model context-dependencies in speech is presented. We use LVQ to map acoustic contextual data into context independent phonemic form. The acoustic data is in the form of concatenated averages of successive short-time feature vectors. This mapping eliminates the need to employ context dependent phonemic, for example, triphone HMMs, and the difficulties associated therein. Instead, simpler context independent discrete observation HMMs suffice. We report excellent results for a speaker dependent task for Finnish.

[1]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[2]  Shigeru Katagiri,et al.  Speaker-independent large vocabulary word recognition using an LVQ/HMM hybrid algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[4]  Jorma Laaksonen,et al.  LVQPAK: A software package for the correct application of Learning Vector Quantization algorithms , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[5]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[6]  E. McDermott,et al.  A hybrid speech recognition system using HMMs with an LVQ-trained codebook , 1990 .

[7]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Steve Austin,et al.  Speech recognition using segmental neural nets , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Shigeru Katagiri,et al.  Prototype-based MCE/GPD training for word spotting and connected word recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Mikko Kurimo,et al.  Status Report Of The Finnish Phonetic Typewriter Project , 1991 .

[11]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12]  D. O'Shaughnessy,et al.  Hybrid segmental-LVQ/HMM for large vocabulary speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  John Makhoul,et al.  Discriminant analysis and supervised vector quantization for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[15]  T. Kohonen,et al.  Appendix 2.4 Stopping Rule 2.3 Fine Tuning Using the Basic Lvq1 or Lvq2.1 Lvq Pak: a Program Package for the Correct Application of Learning Vector Quantization Algorithms , 1992 .

[16]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[17]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[18]  Paul Bamberg,et al.  Phoneme-in-Context Modeling for Dragon's Continuous Speech Recognizer , 1990, HLT.

[19]  Teuvo Kohonen,et al.  Improved versions of learning vector quantization , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[20]  Shigeru Katagiri,et al.  Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Olli Ventä,et al.  Phonetic typewriter for Finnish and Japanese , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[22]  M. A. Bush,et al.  Speaker-independent vowel classification using hidden Markov models and LVQ2 , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[23]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[24]  Kay-Fu Lee,et al.  Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[25]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[26]  E. McDermott,et al.  Phoneme recognition using Kohonen's LVQ , 1988 .

[27]  Steve Young,et al.  The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Shigeru Katagiri,et al.  A new connected word recognition algorithm based on HMM/LVQ segmentation and LVQ classification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.