Large-Scale Multiclass Support Vector Machine Training via Euclidean Projection onto the Simplex

Dual decomposition methods are the current state-of-the-art for training multiclass formulations of Support Vector Machines (SVMs). At every iteration, dual decomposition methods update a small subset of dual variables by solving a restricted optimization problem. In this paper, we propose an exact and efficient method for solving the restricted problem. In our method, the restricted problem is reduced to the well-known problem of Euclidean projection onto the positive simplex, which we can solve exactly in expected O(k) time, where k is the number of classes. We demonstrate that our method empirically achieves state-of-the-art convergence on several large-scale high-dimensional datasets.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Chih-Jen Lin,et al.  A sequential dual method for large scale multi-class linear svms , 2008, KDD.

[3]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[4]  S. Sundararajan,et al.  A Sequential Dual Method for Structural SVMs , 2011, SDM.

[5]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[6]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[7]  Naonori Ueda,et al.  Online Passive-Aggressive Algorithms for Non-Negative Matrix Factorization and Completion , 2014, AISTATS.

[8]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[9]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[11]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[12]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[13]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[14]  Yoram Singer,et al.  Efficient Learning of Label Ranking by Soft Projections onto Polyhedra , 2006, J. Mach. Learn. Res..

[15]  Peter L. Bartlett,et al.  Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[16]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[17]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[18]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[19]  Chih-Jen Lin,et al.  A formal analysis of stopping criteria of decomposition methods for support vector machines , 2002, IEEE Trans. Neural Networks.

[20]  Kazuhiro Seki,et al.  Block coordinate descent algorithms for large-scale sparse multiclass classification , 2013, Machine Learning.

[21]  Jason Weston,et al.  Solving multiclass support vector machines with LaRank , 2007, ICML '07.