STANDARDIZATION AND THE GROUP LASSO PENALTY.

We re-examine the original Group Lasso paper of Yuan and Lin (2007). The form of penalty in that paper seems to be designed for problems with uncorrelated features, but the statistical community has adopted it for general problems with correlated features. We show that for this general situation, a Group Lasso with a different choice of penalty matrix is generally more effective. We give insight into this formulation and show that it is intimately related to the uniformly most powerful invariant test for inclusion of a group. We demonstrate the efficacy of this method- the "standardized Group Lasso"- over the usual group lasso on real and simulated data sets. We also extend this to the Ridged Group Lasso to provide within group regularization as needed. We discuss a simple algorithm based on group-wise coordinate descent to fit both this standardized Group Lasso and Ridged Group Lasso.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Trevor Hastie,et al.  Applications of the lasso and grouped lasso to the estimation of sparse graphical models , 2010 .

[3]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[4]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[5]  M. Drton,et al.  Exact block-wise optimization in group lasso and sparse group lasso for linear regression , 2010, 1010.3320.

[6]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[7]  Ming Yuan,et al.  Sparse Recovery in Large Ensembles of Kernel Machines On-Line Learning and Bandits , 2008, COLT.

[8]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[9]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[10]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[13]  A. Hero,et al.  A multidimensional shrinkage-thresholding operator , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[14]  YuBin,et al.  Minimax-optimal rates for sparse additive models over kernel classes via convex programming , 2012 .

[15]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[16]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[17]  V. Koltchinskii,et al.  SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[18]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .