Evaluation of speech recognizers for speech training applications

The use of speech recognition technology for speech training represents an important and potentially very large application of speech technology. However, speech training places unique demands on recognizer performance that have not been well-characterized. In this research, a database and testing procedures were developed to evaluate two facets of recognizer performance integral to speech training: utterance identification and speech quality assessment. Using these materials, three commercial speech recognizers that employ different types of recognition algorithms were evaluated. In general, the recognizer, based on hidden Markov models (HMM's), provided better identification scores for normal and disordered speech than the two template-based recognizers. A recognizer's identification performance on normal speech often predicted its identification performance on disordered speech. For each recognizer, analysis using phonological features revealed classes of speech sounds that are poorly discriminated. Procedures were developed to provide human ratings of the quality of disordered speech for comparison to recognizer performance. Recognizers were compared to speech-language pathologists with respect to the ability to judge speech quality. In contrast, with identification performance, the two speech recognizers based on template comparisons provided better measures of speech quality than the HMM-based recognizer. >

[1]  C. Watson,et al.  The Indiana Speech Training Aid (ISTRA). I: Comparisons between human and computer-based evaluation of speech quality. , 1989, Journal of speech and hearing research.

[2]  Lloyd A. Smith,et al.  Template adaptation in a hypersphere word classifier , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. Halle Problem Book In Phonology , 1983 .

[4]  S. B. Chin,et al.  Some constraints on functionally disordered phonologies: phonetic inventories and phonotactics. , 1990, Journal of speech and hearing research.

[5]  Claude E. Shannon,et al.  The Mathematical Theory of Communication , 1950 .

[6]  David S. Pallett,et al.  Speech recognition performance assessments and available databases , 1983, ICASSP.

[7]  J. Gierut,et al.  Differential learning of phonological oppositions. , 1990, Journal of speech and hearing research.

[8]  P. Ladefoged A course in phonetics , 1975 .

[9]  L. Shriberg Articulation judgments: some perceptual considerations. , 1972, Journal of speech and hearing research.

[10]  David B. Pisoni,et al.  Automatic measurement of speech recognition performance: a comparison of six speaker-dependent recognition devices☆ , 1987 .

[11]  J. Gierut The conditions and course of clinically induced phonological change. , 1992, Journal of speech and hearing research.

[12]  Charles S. Watson,et al.  The Indiana Speech Training Aid (ISTRA) II: Training curriculum and selected case studies , 1991 .

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.