Robust High Dimensional Sparse Regression and Matching Pursuit

We consider high dimensional sparse regression, and develop strategies able to deal with arbitrary -- possibly, severe or coordinated -- errors in the covariance matrix $X$. These may come from corrupted data, persistent experimental errors, or malicious respondents in surveys/recommender systems, etc. Such non-stochastic error-in-variables problems are notoriously difficult to treat, and as we demonstrate, the problem is particularly pronounced in high-dimensional settings where the primary goal is {\em support recovery} of the sparse regressor. We develop algorithms for support recovery in sparse regression, when some number $n_1$ out of $n+n_1$ total covariate/response pairs are {\it arbitrarily (possibly maliciously) corrupted}. We are interested in understanding how many outliers, $n_1$, we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. Perhaps surprisingly, we also show that the natural brute force algorithm that searches over all subsets of $n$ covariate/response pairs, and all subsets of possible support coordinates in order to minimize regression error, is remarkably poor, unable to correctly identify the support with even $n_1 = O(n/k)$ corrupted points, where $k$ is the sparsity. This is true even in the basic setting we consider, where all authentic measurements and noise are independent and sub-Gaussian. In this setting, we provide a simple algorithm -- no more computationally taxing than OMP -- that gives stronger performance guarantees, recovering the support with up to $n_1 = O(n/(\sqrt{k} \log p))$ corrupted points, where $p$ is the dimension of the signal to be recovered.

[1]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[2]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[5]  Jean-Jacques Fuchs,et al.  An inverse problem approach to robust regression , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[7]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[8]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[9]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[10]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[11]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[12]  John Wright Dense Error Correction via `-Minimization , 2008 .

[13]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[14]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[15]  Richard G. Baraniuk,et al.  Exact signal recovery from sparsely corrupted measurements through the Pursuit of Justice , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[16]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[17]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[18]  John Wright,et al.  Dense Error Correction Via $\ell^1$-Minimization , 2010, IEEE Transactions on Information Theory.

[19]  Thomas Strohmer,et al.  General Deviants: An Analysis of Perturbations in Compressed Sensing , 2009, IEEE Journal of Selected Topics in Signal Processing.

[20]  Michael B. Wakin,et al.  Analysis of Orthogonal Matching Pursuit Using the Restricted Isometry Property , 2009, IEEE Transactions on Information Theory.

[21]  A. Tsybakov,et al.  Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[22]  Georgios B. Giannakis,et al.  Sparsity-Cognizant Total Least-Squares for Perturbed Compressive Sampling , 2010, IEEE Transactions on Signal Processing.

[23]  Xiaodong Li,et al.  Compressed Sensing and Matrix Completion with Constant Proportion of Corruptions , 2011, ArXiv.

[24]  Georgios B. Giannakis,et al.  From Sparse Signals to Sparse Residuals for Robust Sensing , 2011, IEEE Transactions on Signal Processing.

[25]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[26]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[27]  Constantine Caramanis,et al.  Robust Matrix Completion with Corrupted Columns , 2011, ArXiv.

[28]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[29]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[30]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[32]  W. Marsden I and J , 2012 .

[33]  Constantine Caramanis,et al.  Orthogonal Matching Pursuit with Noisy and Missing Data: Low and High Dimensional Results , 2012, ArXiv.

[34]  Yaoliang Yu,et al.  A Polynomial-time Form of Robust Regression , 2012, NIPS.

[35]  Joel A. Tropp,et al.  Robust computation of linear models, or How to find a needle in a haystack , 2012, ArXiv.

[36]  A. Tsybakov,et al.  Improved Matrix Uncertainty Selector , 2011, 1112.4413.

[37]  Shie Mannor,et al.  Outlier-Robust PCA: The High-Dimensional Case , 2013, IEEE Transactions on Information Theory.

[38]  Trac D. Tran,et al.  Exact Recoverability From Dense Corrupted Observations via $\ell _{1}$-Minimization , 2011, IEEE Transactions on Information Theory.

[39]  Constantine Caramanis,et al.  Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery , 2013, ICML.

[40]  Joel A. Tropp,et al.  Robust Computation of Linear Models by Convex Relaxation , 2012, Foundations of Computational Mathematics.