Online PCA for Contaminated Data

We consider the online Principal Component Analysis (PCA) where contaminated samples (containing outliers) are revealed sequentially to the Principal Components (PCs) estimator. Due to their sensitiveness to outliers, previous online PCA algorithms fail in this case and their results can be arbitrarily skewed by the outliers. Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. We show that the final result of the proposed online RPCA has an acceptable degradation from the optimum. Actually, under mild conditions, online RPCA achieves the maximal robustness with a 50% breakdown point. Moreover, online RPCA is shown to be efficient for both storage and computation, since it need not re-explore the previous samples as in traditional robust PCA algorithms. This endows online RPCA with scalability for large scale data.

[1]  Namrata Vaswani,et al.  Recursive Robust PCA or Recursive Sparse Recovery in Large but Structured Noise , 2012, IEEE Transactions on Information Theory.

[2]  Shie Mannor,et al.  Principal Component Analysis with Contaminated Data: The High Dimensional Case , 2010, COLT 2010.

[3]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[4]  Yongmin Li,et al.  On incremental and robust subspace learning , 2004, Pattern Recognit..

[5]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[6]  Shuicheng Yan,et al.  Robust PCA in High-dimension: A Deterministic Approach , 2012, ICML.

[7]  Paul Honeine,et al.  Online Kernel Principal Component Analysis: A Reduced-Order Model , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[9]  Ralph R. Martin,et al.  Merging and Splitting Eigenspace Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  J. Bunch,et al.  Updating the singular value decomposition , 1978 .

[11]  Christophe Croux,et al.  A Fast Algorithm for Robust Principal Components Based on Projection Pursuit , 1996 .

[12]  Haitao Zhao,et al.  A novel incremental principal component analysis and its application for face recognition , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[14]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[15]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[16]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[17]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[18]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[19]  M. Hubert,et al.  A fast method for robust principal components with applications to chemometrics , 2002 .

[20]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[21]  Vincent Nesme,et al.  Note on sampling without replacing from a finite collection of matrices , 2010, ArXiv.

[22]  Guoying Li,et al.  Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo , 1985 .

[23]  John C. S. Lui,et al.  Online Robust Subspace Tracking from Partial Information , 2011, ArXiv.

[24]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[25]  Namrata Vaswani,et al.  Recursive robust PCA or recursive sparse recovery in large but structured noise , 2013, ICASSP.

[26]  Shaoning Pang,et al.  A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier , 2004, PRICAI.