Regularized Linear Regression: A Precise Analysis of the Estimation Error

Non-smooth regularized convex optimization procedures have emerged as a powerful tool to recover structured signals (sparse, low-rank, etc.) from (possibly compressed) noisy linear measurements. We focus on the problem of linear regression and consider a general class of optimization methods that minimize a loss function measuring the misfit of the model to the observations with an added structured-inducing regularization term. Celebrated instances include the LASSO, GroupLASSO, Least-Absolute Deviations method, etc.. We develop a quite general framework for how to determine precise prediction performance guaranties (e.g. mean-square-error) of such methods for the case of Gaussian measurement ensemble. The machinery builds upon Gordon’s Gaussian min-max theorem under additional convexity assumptions that arise in many practical applications. This theorem associates with a primary optimization (PO) problem a simplified auxiliary optimization (AO) problem from which we can tightly infer properties of the original (PO), such as the optimal cost, the norm of the optimal solution, etc. Our theory applies to general loss functions and regularization and provides guidelines on how to optimally tune the regularizer coefficient when certain structural properties (such as sparsity level, rank, etc.) are known.

[1]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[2]  John Wright,et al.  Dense Error Correction Via $\ell^1$-Minimization , 2010, IEEE Transactions on Information Theory.

[3]  Mihailo Stojnic Upper-bounding ℓ1-optimization weak thresholds , 2013, ArXiv.

[4]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[5]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[6]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[7]  Dennis Amelunxen,et al.  Gordon's inequality and condition numbers in conic optimization , 2014, 1408.3016.

[8]  R. Vershynin Estimation in High Dimensions: A Geometric Perspective , 2014, 1405.5103.

[9]  Mihailo Stojnic,et al.  Meshes that trap random subspaces , 2013, ArXiv.

[10]  Andrea Montanari,et al.  The LASSO Risk for Gaussian Matrices , 2010, IEEE Transactions on Information Theory.

[11]  Martin J. Wainwright,et al.  Randomized sketches of convex programs with sharp guarantees , 2014, 2014 IEEE International Symposium on Information Theory.

[12]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[13]  Rina Foygel,et al.  Corrupted Sensing: Novel Guarantees for Separating Structured Signals , 2013, IEEE Transactions on Information Theory.

[14]  Christos Thrampoulidis,et al.  Isotropically random orthogonal matrices: Performance of LASSO and minimum conic singular values , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[15]  Y. Gordon Elliptically contoured distributions , 1987 .

[16]  Vidyashankar Sivakumar,et al.  Estimation with Norm Regularization , 2014, NIPS.

[17]  Christos Thrampoulidis,et al.  Estimating structured signals in sparse noise: A precise noise sensitivity analysis , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Mihailo Stojnic Spherical perceptron as a storage memory with limited errors , 2013 .

[19]  John Wright,et al.  Dense error correction via l1-minimization , 2008, ICASSP.

[20]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[21]  Christos Thrampoulidis,et al.  The squared-error of generalized LASSO: A precise analysis , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[22]  R. Adamczak,et al.  Restricted Isometry Property of Matrices with Independent Columns and Neighborly Polytopes by Random Sampling , 2009, 0904.4723.

[23]  S. Mendelson,et al.  Sparse recovery under weak moment assumptions , 2014, 1401.2188.

[24]  A. Banerjee Convex Analysis and Optimization , 2006 .

[25]  M. Rudelson,et al.  On sparse reconstruction from Fourier and Gaussian measurements , 2008 .

[26]  Joel A. Tropp,et al.  Living on the edge: A geometric theory of phase transitions in convex optimization , 2013, ArXiv.

[27]  Mihailo Stojnic,et al.  A framework to characterize performance of LASSO algorithms , 2013, ArXiv.

[28]  M. Sion On general minimax theorems , 1958 .

[29]  Babak Hassibi,et al.  Asymptotically Exact Denoising in Relation to Compressed Sensing , 2013, ArXiv.

[30]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[31]  E. Candès Mathematics of Sparsity (and a Few Other Things) , 2014 .

[32]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[33]  Mihailo Stojnic,et al.  Various thresholds for ℓ1-optimization in compressed sensing , 2009, ArXiv.

[34]  Andrea Montanari,et al.  The Noise-Sensitivity Phase Transition in Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[35]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[36]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[37]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[38]  Christos Thrampoulidis,et al.  Precise error analysis of the LASSO , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[40]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[41]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[42]  Lie Wang The L1L1 penalized LAD estimator for high dimensional linear regression , 2013, J. Multivar. Anal..

[43]  Y. Gordon On Milman's inequality and random subspaces which escape through a mesh in ℝ n , 1988 .

[44]  Babak Hassibi,et al.  New Null Space Results and Recovery Thresholds for Matrix Rank Minimization , 2010, ArXiv.

[45]  Sergio Verdú,et al.  Optimal Phase Transitions in Compressed Sensing , 2011, IEEE Transactions on Information Theory.

[46]  Y. Gordon Some inequalities for Gaussian processes and applications , 1985 .

[47]  R. Gill,et al.  Cox's regression model for counting processes: a large sample study : (preprint) , 1982 .

[48]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[49]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[50]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[51]  Yonina C. Eldar,et al.  Compressed Sensing with Coherent and Redundant Dictionaries , 2010, ArXiv.

[52]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .