Large-Margin Convex Polytope Machine

We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks. The CPM finds a large margin convex polytope separator which encloses one class. We develop a stochastic gradient descent based algorithm that is amenable to massive datasets, and augment it with a heuristic procedure to avoid sub-optimal local minima. Our experimental evaluations of the CPM on large-scale datasets from distinct domains (MNIST handwritten digit recognition, text topic, and web security) demonstrate that the CPM trains models faster, sometimes several orders of magnitude, than state-of-the-art similar approaches and kernel-SVM methods while achieving comparable or better classification performance. Our empirical results suggest that, unlike prior similar approaches, we do not need to control the number of sub-classifiers (sides of the polytope) to avoid overfitting.

[1]  Paul Fischer,et al.  More or less efficient agnostic learning of convex polygons , 1995, COLT '95.

[2]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Naresh Manwani,et al.  Learning Polyhedral Classifiers Using Logistic Function , 2010, ACML.

[5]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[6]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[7]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[8]  Slobodan Vucetic,et al.  BudgetedSVM: a toolbox for scalable SVM approximations , 2013, J. Mach. Learn. Res..

[9]  G. Takács Smooth maximum based algorithms for classification, regression, and collaborative filtering , 2010 .

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Koby Crammer,et al.  Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification , 2011, KDD.

[12]  John Langford,et al.  Conditional Probability Tree Estimation Analysis and Algorithms , 2009, UAI.

[13]  Naresh Manwani,et al.  Polyceptron: A Polyhedral Learning Algorithm , 2011, ArXiv.

[14]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[15]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  Zheng Chen,et al.  P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[18]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[19]  John Langford,et al.  Weighted One-Against-All , 2005, AAAI.

[20]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..