Lexical stress estimation and phonological knowledge

Abstract It is argued that the prosodic feature stress is useful in constraining the number of hypotheses a speech recognition system produces. A probabilistic algorithm is described for the estimation of the lexical stress pattern of English words from the acoustic signal using hidden Markov models (HMMs) with continuous asymmetric Gaussian probability density functions. Adopting binary (stressed or unstressed) syllable models, two five-state HMMs of the left-to-right type were generated, one for each value of the binary opposition. Training observation vectors were extracted from a corpus of bisyllabic stress-minimal word pairs, where each word occurred in a continuously spoken sentence. The vectors consisted of nine acoustic measurements based on fundamental frequency, syllabic energy and coarse linear prediction spectra. Evaluation of both stressed and unstressed models using a new set of recordings of the same word pairs yielded an average syllable-stress recognition rate of 94%.

[1]  P. Lieberman Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech , 1963 .

[2]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[3]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[4]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[5]  Jacqueline Vaissière Speech recognition: a tutorial , 1986 .

[6]  Andrej Ljolje,et al.  Modelling of speech using primarily prosodic parameters , 1987 .

[7]  K Schäfer-Vincent,et al.  Pitch Period Detection and Chaining: Method and Evaluation , 1983, Phonetica.

[8]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[9]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[10]  R. B. Monsen,et al.  Study of variations in the male and female glottal wave. , 1976, The Journal of the Acoustical Society of America.

[11]  Andrej Ljolje,et al.  Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models , 1986, IEEE Trans. Acoust. Speech Signal Process..

[12]  P. Lieberman Some Acoustic Correlates of Word Stress in American English , 1959 .

[13]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[14]  D. Fry Experiments in the Perception of Stress , 1958 .

[15]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[16]  J. Friedman,et al.  Computer exploration of fast-speech rules , 1975 .

[17]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[18]  Wayne A. Lea,et al.  Prosodic Aids to Speech Recognition , 1972 .