论文信息 - Fast Projections onto ℓ1, q -Norm Balls for Grouped Feature Selection

Fast Projections onto ℓ1, q -Norm Balls for Grouped Feature Selection

Joint sparsity is widely acknowledged as a powerful structural cue for performing feature selection in setups where variables are expected to demonstrate "grouped" behavior. Such grouped behavior is commonly modeled by Group-Lasso or Multitask Lasso-type problems, where feature selection is effected via l1,q -mixed-norms. Several particular formulations for modeling groupwise sparsity have received substantial attention in the literature; and in some cases, efficient algorithms are also available. Surprisingly, for constrained formulations of fundamental importance (e.g., regression with an l1,∞-norm constraint), highly scalable methods seem to be missing. We address this deficiency by presenting a method based on spectral projected-gradient (SPG) that can tackle l1,q -constrained convex regression problems. The most crucial component of our method is an algorithm for projecting onto l1,q>-norm balls. We present several numerical results which show that our methods attain up to 30X speedups on large l1,∞-multitask lasso problems. Even more dramatic are the gains for just the l1,∞-projection subproblem: we observe almost three orders of magnitude speedups compared against the currently standard method.

Suvrit Sra | S. Sra

[1] Stéphane Canu,et al. $\ell_{p}-\ell_{q}$ Penalty for Sparse Linear and Sparse Multiple Kernel Multitask Learning , 2011, IEEE Transactions on Neural Networks.

[2] R. Tyrrell Rockafellar,et al. Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[3] Inderjit S. Dhillon,et al. A scalable trust-region algorithm with application to mixed-norm regression , 2010, ICML.

[4] Shie-Shien Yang. [Multiresponse Estimation with Special Application to Linear Systems of Differential Equations]: Discussion , 1985 .

[5] Roger Fletcher,et al. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming , 2005, Numerische Mathematik.

[6] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[7] R. Tibshirani,et al. A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[8] Stephen J. Wright,et al. Simultaneous Variable Selection , 2005, Technometrics.

[9] David L. Donoho,et al. De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[10] Jieping Ye,et al. Efficient L1/Lq Norm Regularization , 2010, ArXiv.

[11] Jun Liu,et al. Efficient Euclidean projections in linear time , 2009, ICML '09.