Top-k multi-class SVM using multiple features

Abstract Recent studies have demonstrated the advantages of fusing multi-modal features in improving the accuracy of visual object classification. However, regarding a complex classification task with a large number of categories, previous studies on multiple feature fusion are prone to failure resulting from the occurrence of class ambiguity. In this paper, we address this issue by allowing k  ( k  ≥ 2) guesses at the top instead of only considering the one with the largest prediction score in the framework of multi-view learning. This strategy relaxes the penalty for making an error in the top- k predictions, which can mitigate the challenge of class ambiguity to some extent. To fuse multiple features effectively, we introduce an adaptive weight for each view and exploit an efficient alternating optimization algorithm to learn the optimal classifiers and their corresponding weights jointly. Extensive experiments on several benchmark datasets illustrate the effectiveness and superiority of the proposed model over the state-of-the-art approaches.

[1]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[2]  Guangfeng Lin,et al.  Visual feature coding based on heterogeneous structure fusion for image classification , 2017, Inf. Fusion.

[3]  Ivor W. Tsang,et al.  Event Detection Using Multi-level Relevance Labels and Multiple Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Di Guo,et al.  Weakly Paired Multimodal Fusion for Object Recognition , 2018, IEEE Transactions on Automation Science and Engineering.

[6]  Bernt Schiele,et al.  Loss Functions for Top-k Error: Analysis and Insights , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[8]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[9]  Brendan J. Frey,et al.  Probabilistic n-Choose-k Models for Classification and Ranking , 2012, NIPS.

[10]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[11]  Jon Atli Benediktsson,et al.  Probabilistic Fusion of Pixel-Level and Superpixel-Level Hyperspectral Image Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Patrick Gallinari,et al.  Ranking with ordered weighted pairwise classification , 2009, ICML '09.

[14]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[15]  Qinghua Zheng,et al.  Avoiding Optimal Mean Robust PCA/2DPCA with Non-greedy ℓ1-Norm Maximization , 2016, IJCAI.

[16]  Di Guo,et al.  Object Recognition Using Tactile Measurements: Kernel Sparse Coding Methods , 2016, IEEE Transactions on Instrumentation and Measurement.

[17]  Jon Atli Benediktsson,et al.  Multiple Feature Learning for Hyperspectral Image Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Yaoliang Yu,et al.  A Polynomial-time Form of Robust Regression , 2012, NIPS.

[19]  Pan Lin,et al.  Multimodal Sensor Medical Image Fusion Based on Type-2 Fuzzy Logic in NSCT Domain , 2016, IEEE Sensors Journal.

[20]  Mingyue Ding,et al.  Non-rigid multi-modal medical image registration by combining L-BFGS-B with cat swarm optimization , 2015, Inf. Sci..

[21]  Liangpei Zhang,et al.  An SVM Ensemble Approach Combining Spectral, Structural, and Semantic Features for the Classification of High-Resolution Remotely Sensed Imagery , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Jinli Cao,et al.  Trustworthy answers for top-k queries on uncertain Big Data in decision making , 2015, Inf. Sci..

[24]  Nicu Sebe,et al.  Multiple Features But Few Labels?: A Symbiotic Solution Exemplified for Video Analysis , 2014, ACM Multimedia.

[25]  Bernt Schiele,et al.  Top-k Multiclass SVM , 2015, NIPS.

[26]  Ambuj Tewari,et al.  Online Learning to Rank with Top-k Feedback , 2017, J. Mach. Learn. Res..

[27]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[28]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[29]  Yi Yang,et al.  Bi-Level Semantic Representation Analysis for Multimedia Event Detection , 2017, IEEE Transactions on Cybernetics.

[30]  Yi Yang,et al.  Semantic Pooling for Complex Event Analysis in Untrimmed Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[32]  Maya R. Gupta,et al.  Training highly multiclass classifiers , 2014, J. Mach. Learn. Res..

[33]  Shiming Xiang,et al.  Efficient Multiple Feature Fusion With Hashing for Hyperspectral Imagery Classification: A Comparative Study , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[35]  Qinghua Zheng,et al.  Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[37]  Fatos T. Yarman Vural,et al.  A New Fuzzy Stacked Generalization Technique and Analysis of its Performance , 2012, 1204.0171.

[38]  Xing Xu,et al.  Exploiting score distribution for heterogenous feature fusion in image classification , 2017, Neurocomputing.

[39]  Kosin Chamnongthai,et al.  Fusion of color histogram and LBP-based features for texture image retrieval and classification , 2017, Inf. Sci..

[40]  Di Guo,et al.  Extreme Kernel Sparse Learning for Tactile Object Recognition , 2017, IEEE Transactions on Cybernetics.

[41]  K. Kiwiel Variable Fixing Algorithms for the Continuous Quadratic Knapsack Problem , 2008 .

[42]  Fuchun Sun,et al.  Robotic grasping recognition using multi-modal deep extreme learning machine , 2017, Multidimens. Syst. Signal Process..

[43]  Fuchun Sun,et al.  Robotic Room-Level Localization Using Multiple Sets of Sonar Measurements , 2017, IEEE Transactions on Instrumentation and Measurement.

[44]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[45]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[46]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Math. Program..

[47]  Qinghua Zheng,et al.  An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition , 2018, IEEE Transactions on Cybernetics.

[48]  Fuchun Sun,et al.  Visual–Tactile Fusion for Object Recognition , 2017, IEEE Transactions on Automation Science and Engineering.