Fusion of global statistical and segmental spectral features for speech emotion recognition

Speech emotion recognition is an interesting and challenging speech technology, which can be applied to broad areas. In this paper, we propose to fuse the global statistical and segmental spectral features at the decision level for speech emotion recognition. Each emotional utterance is individually scored by two recognition systems, the global statistics-based and segmental spectrum-based systems, and a weighted linear combination is applied to fuse their scores for final decision. Experimental results on an emotional speech database demonstrate that the global statistical and segmental spectral features are complementary, and the proposed fusion approach further improves the performance of the emotion recognition system.

[1]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[2]  Jon Sánchez,et al.  Automatic emotion recognition using prosodic parameters , 2005, INTERSPEECH.

[3]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[4]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[5]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[6]  Björn W. Schuller,et al.  Timing levels in segment-based speech emotion recognition , 2006, INTERSPEECH.

[7]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[8]  Zdravko Kacic,et al.  Improved emotion recognition with large set of statistical features , 2003, INTERSPEECH.

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[11]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[12]  Rosalind W. Picard,et al.  Classical and novel discriminant features for affect recognition from speech , 2005, INTERSPEECH.

[13]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[14]  Wei Wu,et al.  GMM Supervector Based SVM with Spectral Features for Speech Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[16]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..