The squared-error of generalized LASSO: A precise analysis

We consider the problem of estimating an unknown but structured signal x<sub>0</sub> from its noisy linear observations y = Ax<sub>0</sub> + z ∈ ℝ<sup>m</sup>. To the structure of x<sub>0</sub> is associated a structure inducing convex function f(·). We assume that the entries of A are i.i.d. standard normal N(0, 1) and z ~ N(0, σ<sup>2</sup>I<sub>m</sub>). As a measure of performance of an estimate x* of x<sub>0</sub> we consider the “Normalized Square Error” (NSE) ∥x* - x<sub>0</sub>∥<sub>2</sub><sup>2</sup>/σ<sup>2</sup>. For sufficiently small σ, we characterize the exact performance of two different versions of the well known LASSO algorithm. The first estimator is obtained by solving the problem argmin<sub>x</sub> ∥y - Ax∥<sub>2</sub> + λf(x). As a function of λ, we identify three distinct regions of operation. Out of them, we argue that “R<sub>ON</sub>” is the most interesting one. When λ ∈ R<sub>ON</sub>, we show that the NSE is D<sub>f</sub>(x<sub>0</sub>, λ)/m-D<sub>f</sub>(x<sub>0</sub>, λ) for small σ, where D<sub>f</sub>(x<sub>0</sub>, λ) is the expected squared-distance of an i.i.d. standard normal vector to the dilated subdifferential λ · ∂f(x<sub>0</sub>). Secondly, we consider the more popular estimator argmin<sub>x</sub> 1/2∥y - Ax∥<sub>2</sub><sup>2</sup>. + στ f(x). We propose a formula for the NSE of this estimator by establishing a suitable mapping between this and the previous estimator over the region R<sub>ON</sub>. As a useful side result, we find explicit formulae for the optimal estimation performance and the optimal penalty parameters λ* and τ*.

[1]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[2]  M. Stojnic Various thresholds for $\ell_1$-optimization in compressed sensing , 2009 .

[3]  Richard G. Baraniuk,et al.  Asymptotic Analysis of Complex LASSO via Complex Approximate Message Passing (CAMP) , 2011, IEEE Transactions on Information Theory.

[4]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[5]  Joel A. Tropp,et al.  Living on the edge: A geometric theory of phase transitions in convex optimization , 2013, ArXiv.

[6]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[7]  Emmanuel J. Candès,et al.  How well can we estimate a sparse vector? , 2011, ArXiv.

[8]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[9]  Y. Gordon Some inequalities for Gaussian processes and applications , 1985 .

[10]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[11]  Andrea Montanari,et al.  The LASSO Risk for Gaussian Matrices , 2010, IEEE Transactions on Information Theory.

[12]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[13]  Mihailo Stojnic,et al.  Regularly random duality , 2013, ArXiv.

[14]  David L. Donoho,et al.  Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[15]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[16]  Mihailo Stojnic,et al.  A rigorous geometry-probability equivalence in characterization of ℓ1-optimization , 2013, ArXiv.

[17]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[18]  Emmanuel J. Candès,et al.  Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements , 2010, ArXiv.

[19]  Babak Hassibi,et al.  On the Reconstruction of Block-Sparse Signals With an Optimal Number of Measurements , 2008, IEEE Transactions on Signal Processing.

[20]  A. Banerjee Convex Analysis and Optimization , 2006 .

[21]  Babak Hassibi,et al.  Tight recovery thresholds and robustness analysis for nuclear norm minimization , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[22]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[23]  Andrea Montanari,et al.  Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising , 2011, IEEE Transactions on Information Theory.

[24]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[25]  Nicolas Vayatis,et al.  Estimation of Simultaneously Sparse and Low Rank Matrices , 2012, ICML.

[26]  Yi Ma,et al.  Compressive principal component pursuit , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[27]  Jonathan M. Borwein A note on the existence of subgradients , 1982, Math. Program..

[28]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[29]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[30]  D. Donoho,et al.  Minimax risk of matrix denoising by singular value thresholding , 2013, 1304.2085.

[31]  D. Donoho,et al.  Thresholds for the Recovery of Sparse Solutions via L1 Minimization , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[32]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[34]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[35]  Joel A. Tropp,et al.  Sharp recovery bounds for convex deconvolution, with applications , 2012, ArXiv.

[36]  L. R. Scott,et al.  Numerical Analysis , 2011 .

[37]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[38]  Joel A. Tropp,et al.  Living on the edge: phase transitions in convex programs with random data , 2013, 1303.6672.

[39]  Y. Gordon On Milman's inequality and random subspaces which escape through a mesh in ℝ n , 1988 .

[40]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[41]  Babak Hassibi,et al.  New Null Space Results and Recovery Thresholds for Matrix Rank Minimization , 2010, ArXiv.

[42]  Deanna Needell,et al.  Stable Image Reconstruction Using Total Variation Minimization , 2012, SIAM J. Imaging Sci..

[43]  R. Rockafellar SECOND-ORDER CONVEX ANALYSIS , 1999 .

[44]  Babak Hassibi,et al.  On a relation between the minimax risk and the phase transitions of compressed recovery , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[45]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[46]  Mihailo Stojnic,et al.  Block-length dependent thresholds in block-sparse compressed sensing , 2009, ArXiv.

[47]  S. J. Press,et al.  Applied multivariate analysis : using Bayesian and frequentist methods of inference , 1984 .

[48]  Yonina C. Eldar,et al.  Block-Sparse Signals: Uncertainty Relations and Efficient Recovery , 2009, IEEE Transactions on Signal Processing.

[49]  Andrea Montanari,et al.  Analysis of approximate message passing algorithm , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[50]  Andrea Montanari,et al.  The phase transition of matrix recovery from Gaussian measurements matches the minimax MSE of matrix denoising , 2013, Proceedings of the National Academy of Sciences.

[51]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[52]  Mihailo Stojnic,et al.  Various thresholds for ℓ1-optimization in compressed sensing , 2009, ArXiv.

[54]  Stephen P. Boyd,et al.  Applications of second-order cone programming , 1998 .

[55]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[56]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[57]  Yonina C. Eldar,et al.  Simultaneously Structured Models With Application to Sparse and Low-Rank Matrices , 2012, IEEE Transactions on Information Theory.

[58]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[59]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[60]  Joel A. Tropp,et al.  Sharp Recovery Bounds for Convex Demixing, with Applications , 2012, Found. Comput. Math..

[61]  Andrea Montanari,et al.  The Noise-Sensitivity Phase Transition in Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[62]  Mihailo Stojnic,et al.  A performance analysis framework for SOCP algorithms in noisy compressed sensing , 2013, ArXiv.

[63]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[64]  Xiaodong Li,et al.  Compressed Sensing and Matrix Completion with Constant Proportion of Corruptions , 2011, ArXiv.

[65]  Lisa Turner,et al.  Applications of Second Order Cone Programming , 2012 .

[66]  Rina Foygel,et al.  Corrupted Sensing: Novel Guarantees for Separating Structured Signals , 2013, IEEE Transactions on Information Theory.

[67]  Joel A. Tropp,et al.  The achievable performance of convex demixing , 2013, ArXiv.

[68]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[69]  David L. Donoho,et al.  High-Dimensional Centrally Symmetric Polytopes with Neighborliness Proportional to Dimension , 2006, Discret. Comput. Geom..

[70]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[71]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[72]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[73]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[74]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[75]  Andrea Montanari,et al.  Universality in Polytope Phase Transitions and Message Passing Algorithms , 2012, ArXiv.

[76]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[77]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[78]  D. Donoho,et al.  Neighborliness of randomly projected simplices in high dimensions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Mihailo Stojnic,et al.  A framework to characterize performance of LASSO algorithms , 2013, ArXiv.

[80]  Scott Chen,et al.  Examples of basis pursuit , 1995, Optics + Photonics.

[81]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[82]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[83]  Robert D. Nowak,et al.  Tight Measurement Bounds for Exact Recovery of Structured Sparse Signals , 2011, ArXiv.

[84]  V. Bogachev Gaussian Measures on a , 2022 .

[85]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.