论文信息 - Mathematical Programming in Machine Learning and Data Mining - 字舞流文

Mathematical Programming in Machine Learning and Data Mining

The field of Machine Learning (ML) and Data Mining (DM) is focused around the following problem: Given a data domain D we want to approximate an unknown function y(x) on the given data set X ⊂ D (for which the values of y(x) may or may not be known) by a function f from a given class F so that the approximation generalizes in the best possible way on all of the (unseen) data x ∈ D. The approximating function f might take real values, as in the case of regression; binary values, as in the case of classification; or integer values, as in some cases of ranking; or this function might be a mapping between ordered subsets of data points and ordered subsets of real, integer or binary values, as in the case of structured object prediction. The quality of approximation by f can be measured by various objective functions. For instance in the case of support vector machine (SVM)[4] classification the quality of the approximating function is estimated by a weighted sum of a regularization term h(f) and the hinge loss term ∑ x∈X max{1−y(x)f(x), 0}. Hence, many of the machine learning problems can be posed as an optimization problem where optimization is performed over a given class F for a chosen objective. The connection between optimization and machine learning (although always present) became especially evident with the popularity of the SVMs [4], [24], and the kernel methods in general [18]. SVM classification problem is formulated as a convex quadratic program.

Jiming Peng | Katya Scheinberg | Tomaso Poggio | Tamas Terlaky | T. Poggio | T. Terlaky | K. Scheinberg | Jiming Peng | Michael Jordan | D. Shuurmans | Dale Shuurmans | Michael Jordan

[1] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2] Robert Tibshirani,et al. The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[3] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[4] Gregory Shakhnarovich,et al. An investigation of computational and informational limits in Gaussian mixture clustering , 2006, ICML '06.

[5] Inderjit S. Dhillon,et al. Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem , 2007, SDM.

[6] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[7] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[8] Peng Sun,et al. Linear convergence of a modified Frank–Wolfe algorithm for computing minimum-volume enclosing ellipsoids , 2008, Optim. Methods Softw..

[9] Kilian Q. Weinberger,et al. Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[10] Stephen J. Wright,et al. Object-oriented software for quadratic programming , 2003, TOMS.

[11] Alexander J. Smola,et al. Learning with kernels , 1998 .

[12] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[13] Michael C. Ferris,et al. Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[14] Pannagadatta K. Shivaswamy. Ellipsoidal Kernel Machines , 2007 .

[15] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[16] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[17] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[18] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .

[19] Katya Scheinberg,et al. An Efficient Implementation of an Active Set Method for SVMs , 2006, J. Mach. Learn. Res..

[20] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[21] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[22] Nathan Srebro,et al. Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[23] Michael I. Jordan,et al. A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[24] Renato D. C. Monteiro,et al. Large-scale semidefinite programming via a saddle point Mirror-Prox algorithm , 2007, Math. Program..