An efficient multi-feature SVM solver for complex event detection

Multimedia event detection (MED) has become one of the most important visual content analysis tools as the rapid growth of the user generated videos on the Internet. Generally, multimedia data is represented by multiple features and it is difficult to gain better performance for complex event detection with only single feature. However, how to fuse different features effectively is the crucial problem for MED with multiple features. Meanwhile, exploiting multiple features simultaneously in the large-scale scenarios always produces a heavy computational burden. To address these two issues, we propose a self-adaptive multi-feature learning framework with efficient Support Vector Machine (SVM) solver for complex event detection in this paper. Our model is able to utilize multiple features reasonably with an adaptively weighted linear combination manner, which is simple yet effective, according to the various impact that different features on a specific event. In order to mitigate the expensive computational cost, we employ a fast primal SVM solver in the proposed alternating optimization algorithm to obtain the approximate solution with gradient descent method. Extensive experiment results over standard datasets of TRECVID MEDTest 2013 and 2014 demonstrate the effectiveness and superiority of the proposed framework on complex event detection.

[1]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[2]  Vania Bogorny,et al.  Toward Abnormal Trajectory and Event Detection in Video Surveillance , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Mubarak Shah,et al.  Recognizing Complex Events Using Large Margin Joint Low-Level Event Model , 2012, ECCV.

[4]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[5]  Richard M. Stern,et al.  Informedia e-lamp @ TRECVID 2012 multimedia event detection and recounting MED and MER , 2012 .

[6]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[7]  Daniel P. Robinson,et al.  A primal-dual augmented Lagrangian , 2010, Computational Optimization and Applications.

[8]  Fei-Fei Li,et al.  Combining the Right Features for Complex Event Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Stéphane Marchand-Maillet,et al.  Information Fusion in Multimedia Information Retrieval , 2007, Adaptive Multimedia Retrieval.

[10]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[11]  Xiaojun Chang,et al.  Feature Interaction Augmented Sparse Learning for Fast Kinect Motion Detection , 2017, IEEE Transactions on Image Processing.

[12]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[13]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[14]  Yi Yang,et al.  Retrieval based interactive cartoon synthesis via unsupervised bi-distance metric learning , 2009, ACM Multimedia.

[15]  Yiannis Kompatsiaris,et al.  Improving event detection using related videos and relevance degree support vector machines , 2013, MM '13.

[16]  Nicu Sebe,et al.  Event Oriented Dictionary Learning for Complex Event Detection , 2015, IEEE Transactions on Image Processing.

[17]  Vasileios Mezaris,et al.  Video event detection using generalized subclass discriminant analysis and linear support vector machines , 2014, ICMR.

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  Yi Yang,et al.  Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM , 2015, ICML.

[20]  Alexander G. Hauptmann,et al.  Leveraging high-level and low-level features for multimedia event detection , 2012, ACM Multimedia.

[21]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[22]  Yi Yang,et al.  Bi-Level Semantic Representation Analysis for Multimedia Event Detection , 2017, IEEE Transactions on Cybernetics.

[23]  Yi Yang,et al.  Semantic Pooling for Complex Event Analysis in Untrimmed Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Xiaojun Chang,et al.  Revealing Event Saliency in Unconstrained Video Collection , 2017, IEEE Transactions on Image Processing.

[26]  Ioannis Patras,et al.  Video Event Detection Using Kernel Support Vector Machine with Isotropic Gaussian Sample Uncertainty (KSVM-iGSU) , 2016, MMM.

[27]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[28]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[29]  Feiping Nie,et al.  New primal SVM solver with linear computational cost for big data classifications , 2014, ICML 2014.

[30]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[31]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[32]  Chengqi Zhang,et al.  Dynamic Concept Composition for Zero-Example Event Detection , 2016, AAAI.

[33]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Xiaojun Chang,et al.  Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[35]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[36]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[37]  Hui Cheng,et al.  Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[40]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[41]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[43]  Nicu Sebe,et al.  The Many Shades of Negativity , 2017, IEEE Transactions on Multimedia.

[44]  Alexander G. Hauptmann,et al.  Instructional Videos for Unsupervised Harvesting and Learning of Action Examples , 2014, ACM Multimedia.

[45]  Florian Metze,et al.  CMU-Informedia @ TRECVID 2013 Multimedia Event Detection , 2013 .

[46]  Wei Liu,et al.  Multimedia classification and event detection using double fusion , 2013, Multimedia Tools and Applications.