On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines

In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem we are able to incorporate kernels with a compact set of constraints and decompose the dual problem into multiple optimization problems of reduced size. We describe an efficient fixed-point algorithm for solving the reduced optimization problems and prove its convergence. We then discuss technical details that yield significant running time improvements for large datasets. Finally, we describe various experiments with our approach comparing it to previously studied kernel-based methods. Our experiments indicate that for multiclass problems we attain state-of-the-art accuracy.

[1]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[2]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[8]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[9]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[12]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[14]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[15]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[16]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[17]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[19]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[20]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[21]  Hélène Paugam-Moisy,et al.  A new multi-class SVM based on a uniform convergence result , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[22]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Scalable Comput. Pract. Exp..

[23]  Chih-Jen Lin Stopping Criteria of Decomposition Methods for Support Vector Machines: a Theoretical Justification , 2001 .

[24]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[25]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[26]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..