Efficient Euclidean projections via Piecewise Root Finding and its application in gradient projection

Recently, the gradient (subgradient) projection method, especially by incorporating the idea of Nesterov's method, has aroused more and more attention and achieved great successes on constrained optimization problems arising in the field of machine learning, data mining and signal processing. In the gradient projection method, a critical step is how to efficiently project a vector onto a constraint set. In this paper, we propose a unified method called Piecewise Root Finding (PRF) to efficiently calculate Euclidean projections onto three typical constraint sets: @?"1-ball, Elastic Net (EN) and the Intersection of a Hyperplane and a Halfspace (IHH). In our PRF method, we first formulate a Euclidean projection problem as a root finding problem. Then, a Piecewise Root Finding algorithm is applied to find the root and global convergence is guaranteed. Finally, the Euclidean projection result is obtained as a function of the found root in a closed form. Moreover, the sparsity of the projected vector is considered, leading to reduced computational cost for projection onto the @?"1-ball and EN. Empirical studies demonstrate that our PRF algorithm is efficient by comparing it with several state of the art algorithms for Euclidean projections onto the three typical constraint sets mentioned above. Besides, we apply our efficient Euclidean projection algorithm (PRF) to the Gradient Projection with Nesterov's Method (GPNM), which efficiently solves the popular logistic regression problem with the @?"1-ball/EN/IHH constraint. Experimental results on real-world data sets indicate that GPNM has a fast convergence speed.

[1]  Trevor Darrell,et al.  An efficient projection for l1, ∞ regularization , 2009, ICML '09.

[2]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[5]  James T. Kwok,et al.  Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[6]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[7]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[10]  K. Kiwiel On Linear-Time Algorithms for the Continuous Quadratic Knapsack Problem , 2007 .

[11]  R.G. Baraniuk,et al.  Compressive Sensing [Lecture Notes] , 2007, IEEE Signal Processing Magazine.

[12]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[13]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[14]  Roger Fletcher,et al.  New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds , 2006, Math. Program..

[15]  Xi Chen,et al.  Accelerated Gradient Method for Multi-task Sparse Learning Problem , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[22]  Arkadi Nemirovski,et al.  EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .

[23]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[24]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[25]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[26]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[27]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[28]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[29]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[30]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[31]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[32]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[33]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[34]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[35]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[36]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[37]  Yoram Singer,et al.  Efficient Learning of Label Ranking by Soft Projections onto Polyhedra , 2006, J. Mach. Learn. Res..

[38]  A. G. Robinson,et al.  On the continuous quadratic knapsack problem , 1992, Math. Program..