论文信息 - A Survey of L1 Regression

A Survey of L1 Regression

L1 regularization, or regularization with an L1 penalty, is a popular idea in statistics and machine learning. This paper reviews the concept and application of L1 regularization for regression. It is not our aim to present a comprehensive list of the utilities of the L1 penalty in the regression setting. Rather, we focus on what we believe is the set of most representative uses of this regularization technique, which we describe in some detail. Thus, we deal with a number of L1‐regularized methods for linear regression, generalized linear models, and time series analysis. Although this review targets practice rather than theory, we do give some theoretical details about L1‐penalized linear regression, usually referred to as the least absolute shrinkage and selection operator (lasso). La regularisation L1, ou regularisation par penalisation L1, est une notion populaire en statistique et en “machine learning”. Cet article examine le concept et les applications en regression de ces methodes de regularisation. Notre but n'est pas de presenter une liste exhaustive des usages de la penalisation L1 dans les problemes de regression; au contraire, nous nous concentrons sur ce que nous croyons etre l'ensemble des usages les plus representatifs de cette technique, et les decrivons en detail. Ainsi, nous traitons d'un certain nombre de methodes faisant intervenir la regularisation L1 en regression lineaire, dans les modeles lineaires generalises, et en analyse des series temporelles. Bien que cette revue cible la pratique plutot que la theorie, nous donnons quelques precisions theoriques sur la methode couramment designee sous le nom de “lasso”.

[1] Y. Ritov,et al. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[2] H. Zou. The Adaptive Lasso and Its Oracle Properties , 2006 .

[3] Peter M. Williams,et al. Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[4] David Madigan,et al. Algorithms for Sparse Linear Classifiers in the Massive Data Setting , 2008 .

[5] Zaïd Harchaoui,et al. Catching Change-points with Lasso , 2007, NIPS.

[6] R. Tibshirani,et al. Sparsity and smoothness via the fused lasso , 2005 .

[7] G. Casella,et al. Penalized regression, standard errors, and Bayesian lassos , 2010 .

[8] Wotao Yin,et al. A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[9] P. Zhao,et al. The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[10] Lester Melie-García,et al. Estimating brain functional connectivity with sparse multivariate autoregression , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11] N. Meinshausen,et al. Stability selection , 2008, 0809.2932.

[12] Mário A. T. Figueiredo. Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[13] Peng Zhao,et al. On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[14] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[15] Mee Young Park,et al. L1‐regularization path algorithm for generalized linear models , 2007 .

[16] F. R. A. Hopgood,et al. Machine Intelligence 6 , 1972, The Mathematical Gazette.

[17] J. Lafferty,et al. Sparse additive models , 2007, 0711.4555.

[18] R. Tibshirani,et al. Forward stagewise regression and the monotone lasso , 2007, 0705.0269.

[19] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[20] David Madigan,et al. Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[21] Jianqing Fan,et al. A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[22] S. Rosset,et al. Piecewise linear regularized solution paths , 2007, 0708.2197.

[23] J. Lafferty,et al. Rodeo: Sparse, greedy nonparametric regression , 2008, 0803.1709.

[24] B. Efron. The Estimation of Prediction Error , 2004 .

[25] I. Johnstone,et al. Ideal spatial adaptation by wavelet shrinkage , 1994 .

[26] John A. Nelder,et al. Generalized linear models. 2nd ed. , 1993 .

[27] Gareth M. James,et al. Variable Inclusion and Shrinkage Algorithms , 2008 .

[28] Gareth M. James,et al. DASSO: connections between the Dantzig selector and lasso , 2009 .

[29] Peng Zhao,et al. Stagewise Lasso , 2007, J. Mach. Learn. Res..

[30] K. Lange,et al. Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[31] Volker Roth,et al. The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[32] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Sourav Chatterjee,et al. Assumptionless consistency of the Lasso , 2013, 1303.5817.

[34] J. Friedman,et al. A Statistical View of Some Chemometrics Regression Tools , 1993 .

[35] Gavin C. Cawley,et al. Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[36] Timo Similä,et al. Input selection and shrinkage in multiresponse linear regression , 2007, Comput. Stat. Data Anal..

[37] Jianqing Fan,et al. Regularization of Wavelets Approximations , 2011 .

[38] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[39] P. Bickel,et al. SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[40] Peter Buhlmann,et al. Smoothing ℓ1-penalized estimators for high-dimensional time-course data , 2007, 0712.1654.

[41] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[42] Junfeng Yang,et al. A Fast TVL1-L2 Minimization Algorithm for Signal Reconstruction from Partial Fourier Data , 2008 .

[43] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[44] Ji Zhu,et al. L1-Norm Quantile Regression , 2008 .

[45] David L Donoho,et al. Compressed sensing , 2006, IEEE Transactions on Information Theory.

[46] Qing Li,et al. The Bayesian elastic net , 2010 .

[47] N. Meinshausen,et al. High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[48] S. Sathiya Keerthi,et al. A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[49] Wenjiang J. Fu,et al. Asymptotics for lasso-type estimators , 2000 .

[50] M. R. Osborne,et al. On the LASSO and its Dual , 2000 .

[51] David Madigan,et al. Finding Predictive Runs with LAPS , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[52] R. Tibshirani,et al. On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[53] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[54] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[55] Jafar A. Khan,et al. Robust Linear Model Selection Based on Least Angle Regression , 2007 .

[56] Timo Similä,et al. Common Subset Selection of Inputs in Multiresponse Regression , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[57] Chih-Ling Tsai,et al. Regression coefficient and autoregressive order shrinkage and selection via the lasso , 2007 .

[58] Yonina C. Eldar,et al. Collaborative hierarchical sparse modeling , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[59] J. Geweke,et al. Variable selection and model comparison in regression , 1994 .

[60] M. Tan,et al. Efficient methods for estimating constrained parameters with applications to lasso logistic regression. , 2008, Computational statistics & data analysis.

[61] P. McCullagh,et al. Generalized Linear Models, 2nd Edn. , 1990 .

[62] Hao Helen Zhang,et al. Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[63] Sijian Wang,et al. RANDOM LASSO. , 2011, The annals of applied statistics.

[64] M. Yuan,et al. Efficient Empirical Bayes Variable Selection and Estimation in Linear Models , 2005 .

[65] Paul Tseng,et al. A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[66] S. Lahiri,et al. Bootstrapping Lasso Estimators , 2011 .

[67] A. E. Hoerl,et al. Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[68] Lawrence Carin,et al. Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69] Junfeng Yang,et al. A Fast Alternating Direction Method for TVL1-L2 Signal Reconstruction From Partial Fourier Data , 2010, IEEE Journal of Selected Topics in Signal Processing.

[70] C. Stein,et al. Estimation with Quadratic Loss , 1992 .