Modelling of speech using primarily prosodic parameters

Abstract Hidden Markov Models are used in an experiment to investigate how state occupancy corresponds to prosodic parameters and spectral balance. In order to define separate sub-classes in the data using a maximum likelihood approach, modelling was performed using a single model where individual states correspond to different categories without assuming the structure of the data, rather than manually segmenting the data and modelling each predefined category separately. The results indicate a significant content of segmental information in the prosodic parameters, but the results based on the time-alignment of the model states with the feature vectors are in a form which is not directly usable in a recognition environment. The classification of various phonetic categories is particularly consistent for vowels and nasals and is generally better for voiced than unvoiced speech. The classification is also robust to influences of segmental effects on the data, with consistent alignments with segments regardless of the type of neighbouring phenemes.

[1]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[2]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[3]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[4]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[5]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[6]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[7]  Andrej Ljolje,et al.  Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models , 1986, IEEE Trans. Acoust. Speech Signal Process..

[8]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[9]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[10]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[11]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[12]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[13]  David Brazil,et al.  Discourse, Intonation and Language Teaching , 1981 .