A Survey of L1 Regression

L1 regularization, or regularization with an L1 penalty, is a popular idea in statistics and machine learning. This paper reviews the concept and application of L1 regularization for regression. It is not our aim to present a comprehensive list of the utilities of the L1 penalty in the regression setting. Rather, we focus on what we believe is the set of most representative uses of this regularization technique, which we describe in some detail. Thus, we deal with a number of L1‐regularized methods for linear regression, generalized linear models, and time series analysis. Although this review targets practice rather than theory, we do give some theoretical details about L1‐penalized linear regression, usually referred to as the least absolute shrinkage and selection operator (lasso). La regularisation L1, ou regularisation par penalisation L1, est une notion populaire en statistique et en “machine learning”. Cet article examine le concept et les applications en regression de ces methodes de regularisation. Notre but n'est pas de presenter une liste exhaustive des usages de la penalisation L1 dans les problemes de regression; au contraire, nous nous concentrons sur ce que nous croyons etre l'ensemble des usages les plus representatifs de cette technique, et les decrivons en detail. Ainsi, nous traitons d'un certain nombre de methodes faisant intervenir la regularisation L1 en regression lineaire, dans les modeles lineaires generalises, et en analyse des series temporelles. Bien que cette revue cible la pratique plutot que la theorie, nous donnons quelques precisions theoriques sur la methode couramment designee sous le nom de “lasso”.

[1]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[2]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[3]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[4]  David Madigan,et al.  Algorithms for Sparse Linear Classifiers in the Massive Data Setting , 2008 .

[5]  Zaïd Harchaoui,et al.  Catching Change-points with Lasso , 2007, NIPS.

[6]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[7]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[8]  Wotao Yin,et al.  A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[9]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[10]  Lester Melie-García,et al.  Estimating brain functional connectivity with sparse multivariate autoregression , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[12]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[14]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[15]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[16]  F. R. A. Hopgood,et al.  Machine Intelligence 6 , 1972, The Mathematical Gazette.

[17]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[18]  R. Tibshirani,et al.  Forward stagewise regression and the monotone lasso , 2007, 0705.0269.

[19]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[20]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[21]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[22]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[23]  J. Lafferty,et al.  Rodeo: Sparse, greedy nonparametric regression , 2008, 0803.1709.

[24]  B. Efron The Estimation of Prediction Error , 2004 .

[25]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[26]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[27]  Gareth M. James,et al.  Variable Inclusion and Shrinkage Algorithms , 2008 .

[28]  Gareth M. James,et al.  DASSO: connections between the Dantzig selector and lasso , 2009 .

[29]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[30]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[31]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[32]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Sourav Chatterjee,et al.  Assumptionless consistency of the Lasso , 2013, 1303.5817.

[34]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[35]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[36]  Timo Similä,et al.  Input selection and shrinkage in multiresponse linear regression , 2007, Comput. Stat. Data Anal..

[37]  Jianqing Fan,et al.  Regularization of Wavelets Approximations , 2011 .

[38]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[39]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[40]  Peter Buhlmann,et al.  Smoothing ℓ1-penalized estimators for high-dimensional time-course data , 2007, 0712.1654.

[41]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[42]  Junfeng Yang,et al.  A Fast TVL1-L2 Minimization Algorithm for Signal Reconstruction from Partial Fourier Data , 2008 .

[43]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[44]  Ji Zhu,et al.  L1-Norm Quantile Regression , 2008 .

[45]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[46]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[47]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[48]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[49]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[50]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[51]  David Madigan,et al.  Finding Predictive Runs with LAPS , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[52]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[53]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[54]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[55]  Jafar A. Khan,et al.  Robust Linear Model Selection Based on Least Angle Regression , 2007 .

[56]  Timo Similä,et al.  Common Subset Selection of Inputs in Multiresponse Regression , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[57]  Chih-Ling Tsai,et al.  Regression coefficient and autoregressive order shrinkage and selection via the lasso , 2007 .

[58]  Yonina C. Eldar,et al.  Collaborative hierarchical sparse modeling , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[59]  J. Geweke,et al.  Variable selection and model comparison in regression , 1994 .

[60]  M. Tan,et al.  Efficient methods for estimating constrained parameters with applications to lasso logistic regression. , 2008, Computational statistics & data analysis.

[61]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[62]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[63]  Sijian Wang,et al.  RANDOM LASSO. , 2011, The annals of applied statistics.

[64]  M. Yuan,et al.  Efficient Empirical Bayes Variable Selection and Estimation in Linear Models , 2005 .

[65]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[66]  S. Lahiri,et al.  Bootstrapping Lasso Estimators , 2011 .

[67]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[68]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Junfeng Yang,et al.  A Fast Alternating Direction Method for TVL1-L2 Signal Reconstruction From Partial Fourier Data , 2010, IEEE Journal of Selected Topics in Signal Processing.

[70]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[71]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[72]  S. Geer,et al.  Regularization in statistics , 2006 .

[73]  F. Bunea Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization , 2008, 0808.4051.

[74]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[75]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[76]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[77]  David M. Blei,et al.  Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net , 2010, AISTATS.

[78]  Nan-Jung Hsu,et al.  Subset selection for vector autoregressive processes using Lasso , 2008, Comput. Stat. Data Anal..

[79]  Stefan Haufe,et al.  Sparse Causal Discovery in Multivariate Time Series , 2010, NIPS Causality: Objectives and Assessment.

[80]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[81]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[82]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[83]  Gareth M. James,et al.  Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions , 2010 .

[84]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[85]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[86]  Tim Hesterberg,et al.  Least Angle Regression and LASSO for Large Datasets , 2009, Stat. Anal. Data Min..

[87]  Ming Tan,et al.  Efficient methods for estimating constrained parameters with applications to regularized (lasso) logistic regression , 2008, Comput. Stat. Data Anal..

[88]  T. Hesterberg,et al.  Least angle and ℓ1 penalized regression: A review , 2008, 0802.0964.

[89]  Meta M. Voelker,et al.  Variable Selection and Model Building via Likelihood Basis Pursuit , 2004 .

[90]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[91]  Concha Bielza,et al.  Classification of neural signals from sparse autoregressive features , 2013, Neurocomputing.

[92]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[93]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[94]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[95]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[96]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[97]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[98]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[99]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[100]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[101]  John Shawe-Taylor,et al.  Sparse canonical correlation analysis , 2009, Machine Learning.

[102]  Tom Heskes,et al.  Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior , 2010, NeuroImage.

[103]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[104]  Benedikt M. Potscher,et al.  On the distribution of the adaptive LASSO estimator , 2008, 0801.4627.

[105]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[106]  A. Tikhonov On the stability of inverse problems , 1943 .

[107]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[108]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[109]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[110]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[111]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[112]  Harald Binder,et al.  Sparse regression techniques in low-dimensional survival data settings , 2010, Stat. Comput..

[113]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.