Lexical stress recognition using hidden Markov models

A probabilistic algorithm is described for the estimation of the lexical stress pattern of English words from the acoustic signal using hidden Markov models (HMMs) with continuous asymmetric Gaussian probability density functions. Adopting a binary stressed-unstressed syllable classification strategy two five-state HMMs of the left-to-right type were generated, one for each stress value. Training observation vectors were extracted from a corpus of bisyllabic stress-minimal word pairs and consisted of nine acoustic measurements based on fundamental frequency, syllabic energy and coarse linear prediction spectra. Evaluation of both models using a set of recordings of the same word pairs yielded an average stress recognition rate of 94%.<<ETX>>

[1]  David Carter,et al.  An information-theoretic analysis of phonetic dictionary access , 1987 .

[2]  P. Lieberman Some Acoustic Correlates of Word Stress in American English , 1959 .

[3]  Daniel P. Huttenlocher,et al.  Phonotactic and Lexical Constraints in Speech Recognition , 1983, AAAI.

[4]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[5]  Wayne A. Lea,et al.  Trends in Speech Recognition , 1980 .

[6]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[7]  D. Fry Experiments in the Perception of Stress , 1958 .

[8]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[9]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[10]  K Schäfer-Vincent,et al.  Pitch Period Detection and Chaining: Method and Evaluation , 1983, Phonetica.

[11]  Frank Fallside,et al.  Frame compression in hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[12]  A. M. Aull,et al.  Lexical stress determination and its application to large vocabulary speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Andrej Ljolje,et al.  Modelling of speech using primarily prosodic parameters , 1987 .

[14]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.