Status Report Of The Finnish Phonetic Typewriter Project

In connection to a speech recognizer, the aim of which is to produce phonemic transcriptions of arbitrary spoken utterances, we investigate the combined eeect of several improvements at diierent stages of phoneme recognition. The core of the basic recognition system is Learning Vector Quantization (LVQ1) 1]. This algorithm was originally used to classify FFT-based short-time feature vectors into phonemic classes. The phonemic decoding stage was earlier based on simple durational rules 2] 3]. At the feature level, we now study the eeect of using mel-scale cepstral features and concatenating consecutive feature vectors to include context. At the output of vector quantization, a comparison of three approaches to take into account the classiications of feature vectors in local context is presented. The rule-based phonemic decoding is compared to decoding employing Hidden Markov Models (HMMs). As earlier, an optional grammatical post-correction method (DEC) is applied. Experiments conducted with three male speakers indicate that it is possible to increase signiicantly the phonemic transcription accuracy of the previous con-guration. By using appropriately liftered cepstra, concatenating three adjacent feature vectors, and using HMM-based phonemic decoding, the error rate can be decreased from 14.0 % to 5.8 %.

[1]  Anne-Marie Derouault,et al.  Context-dependent phonetic Markov models for large vocabulary speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Y. Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Olli Ventä,et al.  Phonetic typewriter for Finnish and Japanese , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Kari Torkkola,et al.  A combination of neural network and low-level AI-techniques to transcribe speech into phonemes , 1991 .

[7]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[9]  Mikko Kokkonen,et al.  A comparison of two methods to transcribe speech into phonemes: a rule-based method vs. back-propagation , 1990, ICSLP.

[10]  Steven W. Zucker,et al.  On the Foundations of Relaxation Labeling Processes , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.