Ensemble Methods for Continuous Affect Recognition: Multi-modality, Temporality, and Challenges

In this paper we present a multi-modal system based on audio, video and bio-physiological features for continuous recognition of human affect in unconstrained scenarios. We leverage the robustness of ensemble classifiers as base learners and refine the predictions using stochastic gradient descent based optimization on the desired loss function. Furthermore we provide a discussion about pre- and post-processing steps that help to improve the robustness of the regression and subsequently the prediction quality.

[1]  Fabien Ringeval,et al.  AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data , 2015, AVEC@ACM Multimedia.

[2]  Ya Li,et al.  Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video , 2014, AVEC '14.

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Weiting Chen,et al.  Measuring complexity using FuzzyEn, ApEn, and SampEn. , 2009, Medical engineering & physics.

[5]  L. H. Anauer,et al.  Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[6]  Friedhelm Schwenker,et al.  A Multiple Classifier System Approach for Facial Expressions in Image Sequences Utilizing GMM Supervectors , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[8]  Say Wei Foo,et al.  Classification of stress in speech using linear and nonlinear features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Patrick Thiam,et al.  On Annotation and Evaluation of Multi-modal Corpora in Affective Human-Computer Interaction , 2014, MA3HMI@INTERSPEECH.

[11]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Albert Ali Salah,et al.  Ensemble CCA for Continuous Emotion Prediction , 2014, AVEC '14.

[13]  C. Nickerson A note on a concordance correlation coefficient to evaluate reproducibility , 1997 .

[14]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[15]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[16]  Patrick Thiam,et al.  Multimodal Data Fusion for Person-Independent, Continuous Estimation of Pain Intensity , 2015, EANN.

[17]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[18]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[19]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[20]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[21]  Sascha Meudt,et al.  Prosodic, Spectral and Voice Quality Feature Selection Using a Long-Term Stopping Criterion for Audio-Based Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[22]  Markus Kächele,et al.  Inferring Depression and Affect from Application Dependent Meta Knowledge , 2014, AVEC '14.

[23]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Tanaya Guha,et al.  Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions , 2014, AVEC '14.

[27]  Günther Palm,et al.  Revisiting AVEC 2011 - An Information Fusion Architecture , 2012, WIRN.

[28]  Markus Kächele,et al.  Using unlabeled data to improve classification of emotional states in human computer interaction , 2013, Journal on Multimodal User Interfaces.