Mathematical Programming in Machine Learning and Data Mining

The field of Machine Learning (ML) and Data Mining (DM) is focused around the following problem: Given a data domain D we want to approximate an unknown function y(x) on the given data set X ⊂ D (for which the values of y(x) may or may not be known) by a function f from a given class F so that the approximation generalizes in the best possible way on all of the (unseen) data x ∈ D. The approximating function f might take real values, as in the case of regression; binary values, as in the case of classification; or integer values, as in some cases of ranking; or this function might be a mapping between ordered subsets of data points and ordered subsets of real, integer or binary values, as in the case of structured object prediction. The quality of approximation by f can be measured by various objective functions. For instance in the case of support vector machine (SVM)[4] classification the quality of the approximating function is estimated by a weighted sum of a regularization term h(f) and the hinge loss term ∑ x∈X max{1−y(x)f(x), 0}. Hence, many of the machine learning problems can be posed as an optimization problem where optimization is performed over a given class F for a chosen objective. The connection between optimization and machine learning (although always present) became especially evident with the popularity of the SVMs [4], [24], and the kernel methods in general [18]. SVM classification problem is formulated as a convex quadratic program.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[3]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[4]  Gregory Shakhnarovich,et al.  An investigation of computational and informational limits in Gaussian mixture clustering , 2006, ICML '06.

[5]  Inderjit S. Dhillon,et al.  Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem , 2007, SDM.

[6]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[7]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[8]  Peng Sun,et al.  Linear convergence of a modified Frank–Wolfe algorithm for computing minimum-volume enclosing ellipsoids , 2008, Optim. Methods Softw..

[9]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[10]  Stephen J. Wright,et al.  Object-oriented software for quadratic programming , 2003, TOMS.

[11]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[12]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[13]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[14]  Pannagadatta K. Shivaswamy Ellipsoidal Kernel Machines , 2007 .

[15]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[16]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[17]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[18]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[19]  Katya Scheinberg,et al.  An Efficient Implementation of an Active Set Method for SVMs , 2006, J. Mach. Learn. Res..

[20]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[21]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[22]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[23]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[24]  Renato D. C. Monteiro,et al.  Large-scale semidefinite programming via a saddle point Mirror-Prox algorithm , 2007, Math. Program..