论文信息 - A quasi-Newton approach to non-smooth convex optimization - 字舞流文

A quasi-Newton approach to non-smooth convex optimization

We extend the well-known BFGS quasi-Newton method and its memory-limited variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: the local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We prove that under some technical conditions, the resulting subBFGS algorithm is globally convergent in objective function value. We apply its memory-limited variant (subLBFGS) to L2-regularized risk minimization with the binary hinge loss. To extend our algorithm to the multiclass and multilabel settings, we develop a new, efficient, exact line search algorithm. We prove its worst-case time complexity bounds, and show that our line search can also be used to extend a recently developed bundle method to the multiclass and multilabel settings. We also apply the direction-finding component of our algorithm to L1-regularized risk minimization with logistic loss. In all these contexts our methods perform comparable to or better than specialized state-of-the-art solvers on a number of publicly available data sets. An open source implementation of our algorithms is freely available.

S. V. N. Vishwanathan | Simon Günter | Nicol N. Schraudolph | Jin Yu | S. Vishwanathan | N. Schraudolph | Jin Yu | Simon Günter | S. Vishwanathan

[1] P. Wolfe. Convergence Conditions for Ascent Methods. II , 1969 .

[2] Philip Wolfe,et al. Note on a method of conjugate subgradients for minimizing nondifferentiable functions , 1974, Math. Program..

[3] P. Wolfe. Note on a method of conjugate subgradients for minimizing nondifferentiable functions , 1974 .

[4] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[5] John Hershberger,et al. Finding the Upper Envelope of n Line Segments in O(n log n) Time , 1989, Inf. Process. Lett..

[6] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .

[7] Micha Sharir,et al. Davenport-Schinzel sequences and their geometric applications , 1995, Handbook of Computational Geometry.

[8] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[9] L. Qi,et al. A General Approach to Convergence Properties of Some Methods for Nonsmooth Convex Optimization , 1998 .

[10] L. Luksan,et al. Globally Convergent Variable Metric Method for Convex Nonsmooth Unconstrained Minimization1 , 1999 .

[11] Stephen J. Wright,et al. Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[12] D. Bertsekas,et al. Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[13] N. Abe,et al. Polynomial Learnability of Stochastic Rules with Respect to the KL-Divergence and Quadratic Distance , 2001 .

[14] Yuh-Jye Lee,et al. SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[15] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[16] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[17] Koby Crammer,et al. A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[18] Y. Singer,et al. Ultraconservative online algorithms for multiclass problems , 2003 .

[19] Leonidas J. Guibas,et al. Kinetic Data Structures , 2004, Handbook of Data Structures and Applications.

[20] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[21] Marjo S. Haarala. Large-scale nonsmooth optimization : variable metric bundle method with limited memory , 2004 .

[22] Tong Zhang,et al. Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[23] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[24] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[25] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[26] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[27] Ludovic Denoyer,et al. XML Structure Mapping , 2006, INEX.

[28] Jason Weston,et al. Solving multiclass support vector machines with LaRank , 2007, ICML '07.

[29] Richard I. Hartley,et al. Optimal Algorithms in Multiview Geometry , 2007, ACCV.

[30] Olivier Chapelle,et al. Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[31] R. Hartley,et al. Multiple-View Geometry under the L 1-Norm , 2007 .

[32] Alexander J. Smola,et al. Bundle Methods for Machine Learning , 2007, NIPS.

[33] Alexander J. Smola,et al. A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[34] Jianfeng Gao,et al. Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[35] N. Schraudolph,et al. A quasi-Newton approach to non-smooth convex optimization , 2008, ICML '08.

[36] S. V. N. Vishwanathan,et al. Entropy Regularized LPBoost , 2008, ALT.

[37] Sören Sonnenburg,et al. Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.

[38] A. Lewis,et al. BEHAVIOR OF BFGS WITH AN EXACT LINE SEARCH ON NONSMOOTH EXAMPLES , 2008 .

[39] Richard I. Hartley,et al. Multiple-View Geometry Under the {$L_\infty$}-Norm , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] M. Overton. NONSMOOTH OPTIMIZATION VIA BFGS , 2008 .

[41] Bettina Speckmann. Kinetic Data Structures , 2008, Encyclopedia of Algorithms.

[42] Sören Sonnenburg,et al. Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization , 2009, J. Mach. Learn. Res..

[43] Alexander J. Smola,et al. Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[44] Yoram Singer,et al. On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.