Statistical Learning with Sparsity: The Lasso and Generalizations

Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of 1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.

[1]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[2]  H. Scheffé A METHOD FOR JUDGING ALL CONTRASTS IN THE ANALYSIS OF VARIANCE , 1953 .

[3]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[4]  P. W. Kasteleyn Dimer Statistics and Phase Transitions , 1963 .

[5]  M. Fisher On the Dimer Solution of Planar Ising Models , 1966 .

[6]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[7]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[8]  G. Grimmett A THEOREM ABOUT RANDOM FIELDS , 1973 .

[9]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[10]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[11]  D. Rubin The Bayesian Bootstrap , 1981 .

[12]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[13]  D. Oldenburg,et al.  Recovery of the acoustic impedance from reflection seismograms , 1983 .

[14]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[15]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[16]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[18]  F. Santosa,et al.  Linear inversion of ban limit reflection seismograms , 1986 .

[19]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[20]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[21]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[22]  Decision Systems.,et al.  Coordinate ascent for maximizing nondifferentiable concave functions , 1988 .

[23]  D. Donoho,et al.  Uncertainty principles and signal recovery , 1989 .

[24]  V. Rich Personal communication , 1989, Nature.

[25]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[26]  G. S. Thomas The Rating Guide to Life in America's Small Cities , 1990 .

[27]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[28]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[29]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[30]  Robin Thompson,et al.  Graphical models in applied multivariate statistics , 1992 .

[31]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[32]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[33]  Paul Tseng,et al.  Dual coordinate ascent methods for non-strictly convex minimization , 1993, Math. Program..

[34]  D. Welsh Complexity: Knots, Colourings and Counting: Link polynomials and the Tait conjectures , 1993 .

[35]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[36]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[37]  D. Ruderman The statistics of natural images , 1994 .

[38]  Stefano Alliney,et al.  An algorithm for the minimization of mixed l1 and l2 norms with application to Bayesian estimation , 1994, IEEE Trans. Signal Process..

[39]  Jan de Leeuw,et al.  Block-relaxation Algorithms in Statistics , 1994 .

[40]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[41]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[42]  Gerhard Winkler,et al.  Image analysis, random fields and dynamic Monte Carlo methods: a mathematical introduction , 1995, Applications of mathematics.

[43]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[44]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[45]  D. Edwards Introduction to graphical modelling , 1995 .

[46]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[47]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[48]  Nanny Wermuth,et al.  Multivariate Dependencies: Models, Analysis and Interpretation , 1996 .

[49]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[51]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[52]  A. Bruce,et al.  WAVESHRINK WITH FIRM SHRINKAGE , 1997 .

[53]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[54]  Stephen P. Boyd,et al.  Determinant Maximization with Linear Matrix Inequality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[55]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[56]  Olvi L. Mangasarian,et al.  Arbitrary-norm separating plane , 1999, Oper. Res. Lett..

[57]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Jean-Jacques Fuchs,et al.  On the application of the global matched filter to DOA estimation with uniform circular arrays , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[59]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[60]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[61]  S. Geer Empirical Processes in M-Estimation , 2000 .

[62]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[63]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[64]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[65]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[66]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[67]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[68]  Michael Elad,et al.  A generalized uncertainty principle and sparse representation in pairs of bases , 2002, IEEE Trans. Inf. Theory.

[69]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[70]  Chong Gu Smoothing Spline Anova Models , 2002 .

[71]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[72]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[73]  R. Tibshirani,et al.  Pre-validation and inference in microarrays , 2002, Statistical applications in genetics and molecular biology.

[74]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[75]  Hildur Ólafsdóttir,et al.  Adding Curvature to Minimum Description Length Shape Models , 2003, BMVC.

[76]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[77]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[78]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[79]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[80]  Arkadi Nemirovski,et al.  On sparse representation in pairs of bases , 2003, IEEE Trans. Inf. Theory.

[81]  C. Parvin An Introduction to Multivariate Statistical Analysis, 3rd ed. T.W. Anderson. Hoboken, NJ: John Wiley & Sons, 2003, 742 pp., $99.95, hardcover. ISBN 0-471-36091-0. , 2004 .

[82]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[83]  Noga Alon,et al.  Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices , 2004, NIPS.

[84]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[85]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[86]  Jean-Jacques Fuchs,et al.  Recovery of exact sparse representations in the presence of noise , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[87]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[88]  R. Tibshirani,et al.  Efficient quadratic regularization for expression arrays. , 2004, Biostatistics.

[89]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[90]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[91]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[92]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[93]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[94]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[95]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[96]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[97]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[98]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[99]  J. Suykens,et al.  Convex Clustering Shrinkage , 2005 .

[100]  Eero P. Simoncelli 4.7 – Statistical Modeling of Photographic Images , 2005 .

[101]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[102]  D. Donoho,et al.  Counting faces of randomly-projected polytopes when the projection radically lowers dimension , 2006, math/0607364.

[103]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[104]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[105]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[106]  L. V. van't Veer,et al.  Cross‐validated Cox regression on microarray gene expression data , 2006, Statistics in medicine.

[107]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[108]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[109]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[110]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[111]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[112]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[113]  M. Yuan,et al.  On the non‐negative garrotte estimator , 2007 .

[114]  M. Yuan,et al.  Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .

[115]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[116]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[117]  Ian T. Jolliffe,et al.  DALASS: Variable selection in discriminant analysis via the LASSO , 2007, Comput. Stat. Data Anal..

[118]  Irène Gannaz,et al.  Robust estimation and wavelet thresholding in partially linear models , 2007, Stat. Comput..

[119]  T. Richardson,et al.  Estimation of a covariance matrix with zeros , 2005, math/0508268.

[120]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[121]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[122]  Anestis Antoniadis,et al.  Wavelet methods in statistics: Some recent developments and their applications , 2007, 0712.0283.

[123]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[124]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[125]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[126]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[127]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[128]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[129]  R. DeVore,et al.  Compressed sensing and best k-term approximation , 2008 .

[130]  Noureddine El Karoui,et al.  Operator norm consistent estimation of large-dimensional sparse covariance matrices , 2008, 0901.3220.

[131]  Martin J. Wainwright,et al.  Information-theoretic limits of graphical model selection in high dimensions , 2008, 2008 IEEE International Symposium on Information Theory.

[132]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[133]  Francis R. Bach,et al.  Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[134]  Chenlei Leng,et al.  Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data , 2008, Comput. Biol. Chem..

[135]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[136]  M. Lustig,et al.  Compressed Sensing MRI , 2008, IEEE Signal Processing Magazine.

[137]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[138]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[139]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[140]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[141]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[142]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[143]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[144]  A. Zwinderman,et al.  Statistical Applications in Genetics and Molecular Biology Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis , 2011 .

[145]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[146]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[147]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[148]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[149]  Ming Yuan,et al.  Sparse Recovery in Large Ensembles of Kernel Machines On-Line Learning and Bandits , 2008, COLT.

[150]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[151]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[152]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[153]  Nir Friedman,et al.  Probabilistic Graphical Models , 2009, Data-Driven Computational Neuroscience.

[154]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[155]  Andrea Montanari,et al.  Which graphical models are difficult to learn? , 2009, NIPS.

[156]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[157]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[158]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[159]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[160]  Monique Laurent,et al.  Matrix Completion Problems , 2009, Encyclopedia of Optimization.

[161]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[162]  Martin J. Wainwright,et al.  Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness , 2009, NIPS.

[163]  A. Hero,et al.  A multidimensional shrinkage-thresholding operator , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[164]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[165]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[166]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[167]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[168]  Robert Tibshirani,et al.  Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods , 2009, J. Mach. Learn. Res..

[169]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[170]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[171]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[172]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[173]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[174]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[175]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[176]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[177]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[178]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[179]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[180]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[181]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[182]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[183]  S. Stenholm Information, Physics and Computation, by Marc Mézard and Andrea Montanari , 2010 .

[184]  Trevor Hastie,et al.  Applications of the lasso and grouped lasso to the estimation of sparse graphical models , 2010 .

[185]  A. Tsybakov,et al.  Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.

[186]  Larry A. Wasserman,et al.  Time varying undirected graphs , 2008, Machine Learning.

[187]  John Shawe-Taylor,et al.  Sparse canonical correlation analysis , 2009, Machine Learning.

[188]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[189]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[190]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[191]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[192]  K. Lange,et al.  The MM Alternative to EM , 2010, 1104.2203.

[193]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[194]  Anastasia Lykou,et al.  Sparse CCA using a Lasso with positivity constraints , 2010, Comput. Stat. Data Anal..

[195]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[196]  A. Willsky,et al.  Latent variable graphical model selection via convex optimization , 2010 .

[197]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[198]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[199]  V. Koltchinskii,et al.  SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[200]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[201]  Martin J. Wainwright,et al.  Fast global convergence of gradient methods for high-dimensional statistical recovery , 2011, ArXiv.

[202]  Rachel Ward,et al.  New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property , 2010, SIAM J. Math. Anal..

[203]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[204]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[205]  Bradley Efron,et al.  The Bootstrap and Markov-Chain Monte Carlo , 2011, Journal of biopharmaceutical statistics.

[206]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[207]  E. Levina,et al.  Community extraction for social networks , 2010, Proceedings of the National Academy of Sciences.

[208]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[209]  Yoram Singer,et al.  Entire Relaxation Path for Maximum Entropy Problems , 2011, EMNLP.

[210]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[211]  M. Wegkamp,et al.  Optimal selection of reduced rank estimators of high-dimensional matrices , 2010, 1004.2995.

[212]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[213]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[214]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[215]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[216]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[217]  Cun-Hui Zhang,et al.  Confidence Intervals for Low-Dimensional Parameters With High-Dimensional Data , 2011 .

[218]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[219]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[220]  Sham M. Kakade,et al.  Robust Matrix Decomposition With Sparse Corruptions , 2011, IEEE Transactions on Information Theory.

[221]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[222]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[223]  R. Tibshirani,et al.  Sparse estimation of a covariance matrix. , 2011, Biometrika.

[224]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[225]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[226]  Peter Clifford,et al.  Markov Random Fields in Statistics , 2012 .

[227]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[228]  I. Johnstone,et al.  Augmented sparse principal component analysis for high dimensional data , 2012, 1202.1242.

[229]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[230]  Robert Tibshirani,et al.  STANDARDIZATION AND THE GROUP LASSO PENALTY. , 2012, Statistica Sinica.

[231]  Jing Lei,et al.  Minimax Rates of Estimation for Sparse PCA in High Dimensions , 2012, AISTATS.

[232]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[233]  Peter Buhlmann Statistical significance in high-dimensional linear models , 2012, 1202.1377.

[234]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[235]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[236]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[237]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[238]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[239]  Nicholas A. Johnson,et al.  A Dynamic Programming Algorithm for the Fused Lasso and L 0-Segmentation , 2013 .

[240]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[241]  R. Tibshirani,et al.  Adaptive testing for the graphical lasso , 2013, 1307.4765.

[242]  Joshua R. Loftus,et al.  Inference in adaptive regression via the Kac–Rice formula , 2013, 1308.3020.

[243]  Tianxi Li,et al.  High-Dimensional Mixed Graphical Models , 2013, 1304.2810.

[244]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[245]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[246]  Philippe Rigollet,et al.  Computational Lower Bounds for Sparse PCA , 2013, ArXiv.

[247]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[248]  Jonathan E. Taylor,et al.  On model selection consistency of M-estimators with geometrically decomposable penalties , 2013, NIPS 2013.

[249]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[250]  R. Tibshirani,et al.  Sequential selection procedures and false discovery rate control , 2013, 1309.5352.

[251]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[252]  B. Nadler,et al.  MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA. , 2012, Annals of statistics.

[253]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[254]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[255]  Shiqian Ma,et al.  Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection , 2012, Neural Computation.

[256]  Yingbin Liang,et al.  Block Regularized Lasso for Multivariate Multi-Response Linear Regression , 2013, AISTATS.

[257]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[258]  R. Tibshirani Adaptive piecewise polynomial estimation via trend filtering , 2013, 1304.2986.

[259]  Hao Wang,et al.  Coordinate descent algorithm for covariance graphical lasso , 2014, Stat. Comput..

[260]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[261]  A. Appendix Alternating Minimization for Mixed Linear Regression , 2014 .

[262]  Michael A. Saunders,et al.  Proximal Newton-Type Methods for Minimizing Composite Functions , 2012, SIAM J. Optim..

[263]  R. Tibshirani,et al.  Selecting the number of principal components: estimation of the true rank of a noisy matrix , 2014, 1410.8260.

[264]  Constantine Caramanis,et al.  Alternating Minimization for Mixed Linear Regression , 2013, ICML.

[265]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[266]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[267]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[268]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[269]  Joshua R. Loftus,et al.  A significance test for forward stepwise model selection , 2014, 1405.3920.

[270]  Adel Javanmard,et al.  Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory , 2013, IEEE Transactions on Information Theory.

[271]  Robert Tibshirani,et al.  Post-selection adaptive inference for Least Angle Regression and the Lasso , 2014 .

[272]  Martin J. Wainwright,et al.  Randomized sketches of convex programs with sharp guarantees , 2014, 2014 IEEE International Symposium on Information Theory.

[273]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[274]  Genady Grabarnik,et al.  Sparse Modeling: Theory, Algorithms, and Applications , 2014 .

[275]  Vincent Q. Vu,et al.  Sparsistency and agnostic inference in sparse PCA , 2014, 1401.6978.

[276]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[277]  T. Hastie,et al.  Learning Interactions via Hierarchical Group-Lasso Regularization , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[278]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[279]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[280]  Trevor Hastie,et al.  Learning the Structure of Mixed Graphical Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[281]  Joshua R. Loftus,et al.  Inference in adaptive regression via the Kac–Rice formula , 2016 .

[282]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization , 2013, SIAM J. Optim..