Revisiting the EmotiW challenge: how wild is it really?

The focus of this work is emotion recognition in the wild based on a multitude of different audio, visual and meta features. For this, a method is proposed to optimize multi-modal fusion architectures based on evolutionary computing. Extensive uni- and multi-modal experiments show the discriminative power of each computed feature set and fusion architecture. Furthermore, we summarize the EmotiW 2013/2014 challenges and review the conclusions that have been drawn and compare our results with the state-of-the-art on this dataset.

[1]  Maja Pantic,et al.  The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[2]  Sascha Meudt,et al.  Enhanced Autocorrelation in Real World Emotion Recognition , 2014, ICMI.

[3]  H. Hermansky,et al.  The modulation spectrum in the automatic recognition of speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[4]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[5]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[6]  Albert Ali Salah,et al.  Combining modality-specific extreme learning machines for emotion recognition in the wild , 2014, Journal on Multimodal User Interfaces.

[7]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[9]  Gwen Littlewort,et al.  Multiple kernel learning for emotion recognition in the wild , 2013, ICMI '13.

[10]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[11]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[12]  T. Eerola,et al.  A comparison of the discrete and dimensional models of emotion in music , 2011 .

[13]  Jean-Claude Martin,et al.  Evaluation of vision-based real-time measures for emotions discrimination under uncontrolled conditions , 2013, EmotiW '13.

[14]  Fabien Ringeval,et al.  Emotion Recognition in the Wild: Incorporating Voice and Lip Activity in Multimodal Decision-Level Fusion , 2014, ICMI.

[15]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[17]  Hazim Kemal Ekenel,et al.  Why is facial expression analysis in the wild challenging? , 2013, EmotiW '13.

[18]  Razvan Pascanu,et al.  Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[19]  Shiguang Shan,et al.  Partial least squares regression on grassmannian manifold for emotion recognition , 2013, ICMI '13.

[20]  Zheru Chi,et al.  Emotion Recognition in the Wild with Feature Fusion and Multiple Kernel Learning , 2014, ICMI.

[21]  Sascha Meudt,et al.  Prosodic, Spectral and Voice Quality Feature Selection Using a Long-Term Stopping Criterion for Audio-Based Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[22]  Markus Kächele,et al.  Inferring Depression and Affect from Application Dependent Meta Knowledge , 2014, AVEC '14.

[23]  Yair Weiss,et al.  Learning object detection from a small number of examples: the importance of good features , 2004, CVPR 2004.

[24]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Wolfgang Minker,et al.  Emotion Recognition in Real-world Conditions with Acoustic and Visual Features , 2014, ICMI.

[26]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[27]  Shubham Bansal,et al.  Emotion recognition using facial and audio features , 2013, ICMI '13.

[28]  Friedhelm Schwenker,et al.  Multimodal Emotion Classification in Naturalistic User Behavior , 2011, HCI.

[29]  Chloé Clavel,et al.  Fear-type emotion recognition for future audio-based surveillance systems , 2008, Speech Commun..

[30]  Hongying Meng,et al.  Descriptive temporal template features for visual motion recognition , 2009, Pattern Recognit. Lett..

[31]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[32]  Shiguang Shan,et al.  Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild , 2014, ICMI.

[33]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[34]  L. H. Anauer Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[35]  Michal Grosicki Neural Networks for Emotion Recognition in the Wild , 2014, ICMI.

[36]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[37]  Sascha Meudt,et al.  Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech , 2013, ICMI '13.

[38]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[39]  Matti Pietikäinen,et al.  Improved Spatiotemporal Local Monogenic Binary Pattern for Emotion Recognition in The Wild , 2014, ICMI.

[40]  Ying Chen,et al.  Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild , 2014, ICMI.

[41]  Patrick Thiam,et al.  Ensemble Methods for Continuous Affect Recognition: Multi-modality, Temporality, and Challenges , 2015, AVEC@ACM Multimedia.

[42]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[43]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[44]  Michel F. Valstar,et al.  Distribution-based iterative pairwise classification of emotions in the wild using LGBP-TOP , 2013, ICMI '13.

[45]  Matthew Day,et al.  Emotion recognition with boosted tree classifiers , 2013, ICMI '13.

[46]  K. Scherer,et al.  Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception. , 2012, Emotion.

[47]  Nadia Bianchi-Berthouze,et al.  Emotion recognition by two view SVM_2K classifier on dynamic facial expression features , 2011, Face and Gesture 2011.

[48]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[49]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[50]  D. W. Robinson,et al.  A re-determination of the equal-loudness relations for pure tones , 1956 .

[51]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).