Outlier-Robust PCA: The High-Dimensional Case

Principal component analysis plays a central role in statistics, engineering, and science. Because of the prevalence of corrupted data in real-world applications, much research has focused on developing robust algorithms. Perhaps surprisingly, these algorithms are unequipped-indeed, unable-to deal with outliers in the high-dimensional setting where the number of observations is of the same magnitude as the number of variables of each observation, and the dataset contains some (arbitrarily) corrupted observations. We propose a high-dimensional robust principal component analysis algorithm that is efficient, robust to contaminated points, and easily kernelizable. In particular, our algorithm achieves maximal robustness-it has a breakdown point of 50% (the best possible), while all existing algorithms have a breakdown point of zero. Moreover, our algorithm recovers the optimal solution exactly in the case where the number of corrupted points grows sublinearly in the dimension.

[1]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[2]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[3]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[4]  Sanjoy Dasgupta Subspace Detection: A Robust Statistics Formulation , 2003, COLT.

[5]  A. Dempster,et al.  New Tools for Residual Analysis , 1981 .

[6]  C. Croux,et al.  Principal Component Analysis Based on Robust Estimators of the Covariance or Correlation Matrix: Influence Functions and Efficiencies , 2000 .

[7]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[8]  V. Barnett The Ordering of Multivariate Data , 1976 .

[9]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[10]  S. J. Devlin,et al.  Robust Estimation of Dispersion Matrices and Principal Components , 1981 .

[11]  Sheng-De Wang,et al.  Robust algorithms for principal component analysis , 1999, Pattern Recognit. Lett..

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  Ali Jalali,et al.  Low-Rank Matrix Recovery From Errors and Erasures , 2011, IEEE Transactions on Information Theory.

[14]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[15]  M. Rudelson,et al.  Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.

[16]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[17]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[18]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[19]  P. Filzmoser,et al.  Algorithms for Projection-Pursuit Robust Principal Component Analysis , 2007 .

[20]  J. Tropp,et al.  Two proposals for robust PCA using semidefinite programming , 2010, 1012.1086.

[21]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[22]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[23]  D. Titterington Estimation of Correlation Coefficients by Ellipsoidal Trimming , 1978 .

[24]  Michael J. Black,et al.  Robust Principal Component Analysis for Computer Vision , 2001, ICCV.

[25]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[26]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[27]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[28]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[29]  Herbert A. David,et al.  Order Statistics , 2011, International Encyclopedia of Statistical Science.

[30]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[31]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[32]  Alan L. Yuille,et al.  Robust principal component analysis by self-organizing rules based on statistical physics approach , 1995, IEEE Trans. Neural Networks.

[33]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[34]  S. Charles Brubaker,et al.  Robust PCA and clustering in noisy mixtures , 2009, SODA.

[35]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[36]  A. Bebbington A Method of Bivariate Trimming for Robust Estimation of the Correlation Coefficient , 1978 .

[37]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[38]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[39]  Michael J. Black,et al.  Robust principal component analysis for computer vision , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[40]  Xiaodong Li,et al.  Stable Principal Component Pursuit , 2010, 2010 IEEE International Symposium on Information Theory.

[41]  S. J. Devlin,et al.  Robust estimation and outlier detection with correlation coefficients , 1975 .

[42]  J. Helbling Ellipsoïdes minimaux de couverture en statistique multivariée , 1983 .

[43]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.