Robust Regression and Lasso

Lasso, or l1 regularized least squares, has been explored extensively for its remarkable sparsity properties. In this paper it is shown that the solution to Lasso, in addition to its sparsity, has robustness properties: it is the solution to a robust optimization problem. This has two important consequences. First, robustness provides a connection of the regularizer to a physical property, namely, protection from noise. This allows a principled selection of the regularizer, and in particular, generalizations of Lasso that also yield convex optimization problems are obtained by considering different uncertainty sets. Second, robustness can itself be used as an avenue for exploring different properties of the solution. In particular, it is shown that robustness of the solution explains why the solution is sparse. The analysis as well as the specific results obtained differ from standard sparsity results, providing different geometric intuition. Furthermore, it is shown that the robust optimization formulation is related to kernel density estimation, and based on this approach, a proof that Lasso is consistent is given, using robustness directly. Finally, a theorem is proved which states that sparsity and algorithmic stability contradict each other, and hence Lasso is not stable.

[1]  Arkadi Nemirovski,et al.  Robust solutions of uncertain linear programs , 1999, Oper. Res. Lett..

[2]  Felix Schlenk,et al.  Proof of Theorem 2 , 2005 .

[3]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[4]  Alexander J. Smola,et al.  Second Order Cone Programming Approaches for Handling Missing and Uncertain Data , 2006, J. Mach. Learn. Res..

[5]  Laurent El Ghaoui,et al.  Robust Solutions to Least-Squares Problems with Uncertain Data , 1997, SIAM J. Matrix Anal. Appl..

[6]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Ioana Popescu,et al.  Optimal Inequalities in Probability Theory: A Convex Optimization Approach , 2005, SIAM J. Optim..

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Richard A. Davis,et al.  Remarks on Some Nonparametric Estimates of a Density Function , 2011 .

[11]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[12]  Jean-Jacques Fuchs,et al.  On sparse representations in arbitrary redundant bases , 2004, IEEE Transactions on Information Theory.

[13]  J. Bunch,et al.  Collinearity and Total Least Squares , 1994, SIAM J. Matrix Anal. Appl..

[14]  Herbert E. Scarf,et al.  A Min-Max Solution of an Inventory Problem , 1957 .

[15]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[16]  Arkadi Nemirovski,et al.  On sparse representation in pairs of bases , 2003, IEEE Trans. Inf. Theory.

[17]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[18]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[19]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[20]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[21]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[22]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[23]  Nicholas J. Higham,et al.  Backward Error and Condition of Structured Linear Systems , 1992, SIAM J. Matrix Anal. Appl..

[24]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[25]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[26]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[27]  Shie Mannor,et al.  Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[29]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[30]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[31]  Ioana Popescu,et al.  Robust Mean-Covariance Solutions for Stochastic Optimization , 2007, Oper. Res..

[32]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[33]  Melvyn Sim,et al.  The Price of Robustness , 2004, Oper. Res..

[34]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[35]  Huan Xu Robust decision making and its applications in machine learning , 2009 .

[36]  L. Eldén Perturbation Theory for the Least Squares Problem with Linear Equality Constraints , 1980 .

[37]  XuHuan,et al.  Robust regression and Lasso , 2010 .

[38]  Shie Mannor,et al.  Sparse algorithms are not stable: A no-free-lunch theorem , 2008, Allerton 2008.

[39]  P. Kall,et al.  Stochastric programming with recourse: upper bounds and moment problems: a review , 1988 .

[40]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .