Seeing is Worse than Believing: Reading People's Minds Better than Computer-Vision Methods Recognize Actions

We had human subjects perform a one-out-of-six class action recognition task from video stimuli while undergoing functional magnetic resonance imaging (fMRI). Support-vector machines (SVMs) were trained on the recovered brain scans to classify actions observed during imaging, yielding average classification accuracy of 69.73% when tested on scans from the same subject and of 34.80% when tested on scans from different subjects. An apples-to-apples comparison was performed with all publicly available software that implements state-of-the-art action recognition on the same video corpus with the same cross-validation regimen and same partitioning into training and test sets, yielding classification accuracies between 31.25% and 52.34%. This indicates that one can read people’s minds better than state-of-the-art computer-vision methods can perform action recognition.

[1]  J. S. Guntupalli,et al.  The Representation of Biological Classes in the Human Brain , 2012, The Journal of Neuroscience.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Francisco Pereira,et al.  Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments , 2013, Artif. Intell..

[4]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[5]  Tom Michael Mitchell,et al.  A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes , 2010, PloS one.

[6]  Sven J. Dickinson,et al.  Recognize Human Activities from Partially Observed Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[9]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[12]  Bryan R. Conroy,et al.  A Common, High-Dimensional Model of the Representational Space in Human Ventral Temporal Cortex , 2011, Neuron.

[13]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[14]  Yaroslav O. Halchenko,et al.  Brain Reading Using Full Brain Support Vector Machines for Object Recognition: There Is No Face Identification Area , 2008, Neural Computation.

[15]  L. Shah,et al.  Functional magnetic resonance imaging. , 2010, Seminars in roentgenology.

[16]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[17]  Yul-Wan Sung,et al.  Functional magnetic resonance imaging , 2004, Scholarpedia.

[18]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[20]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[23]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[24]  Jiawei Han,et al.  Linear Discriminant Dimensionality Reduction , 2011, ECML/PKDD.

[25]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Stephen José Hanson,et al.  Decoding the Large-Scale Structure of Brain Function by Classifying Mental States Across Individuals , 2009, Psychological science.

[27]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[28]  Rogério Schmidt Feris,et al.  Benchmarking Datasets for Human Activity Recognition , 2011, Visual Analysis of Humans.

[29]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[30]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.