Scalable Robust Principal Component Analysis Using Grassmann Averages

In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.

[1]  Henrik Aanæs,et al.  Robust Factorization , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jian Dong,et al.  Accelerated low-rank visual recovery by random projection , 2011, CVPR 2011.

[3]  René Vidal,et al.  A closed form solution to robust subspace estimation and clustering , 2011, CVPR 2011.

[4]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[5]  Prateek Jain,et al.  Non-convex Robust PCA , 2014, NIPS.

[6]  Ameet Talwalkar,et al.  Divide-and-Conquer Matrix Factorization , 2011, NIPS.

[7]  Xavier Pennec,et al.  Probabilities and statistics on Riemannian manifolds: Basic tools for geometric measurements , 1999, NSIP.

[8]  Nojun Kwak,et al.  Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[10]  Joan Feigenbaum,et al.  Proceedings of the forty-fifth annual ACM symposium on Theory of computing , 2013, STOC 2013.

[11]  Ronen Basri,et al.  Lambertian reflectance and linear subspaces , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[13]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[14]  Junbin Gao,et al.  Linear time Principal Component Pursuit and its extensions using ℓ1 filtering , 2014, Neurocomputing.

[15]  Stuart Geman,et al.  Statistical methods for tomographic image reconstruction , 1987 .

[16]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[17]  Shuicheng Yan,et al.  Robust PCA in High-dimension: A Deterministic Approach , 2012, ICML.

[18]  Leslie S. Smith,et al.  The principal components of natural images , 1992 .

[19]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[20]  René Vidal,et al.  Distributed computer vision algorithms through distributed averaging , 2011, CVPR 2011.

[21]  S. Shankar Sastry,et al.  Generalized Principal Component Analysis , 2016, Interdisciplinary applied mathematics.

[22]  Michael J. Black,et al.  Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[24]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[25]  R. Bhattacharya,et al.  Large sample theory of intrinsic and extrinsic sample means on manifolds--II , 2005, math/0507423.

[26]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[27]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[28]  Assaf Naor,et al.  Efficient rounding for the noncommutative grothendieck inequality , 2012, STOC '13.

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[30]  J. Tropp,et al.  Two proposals for robust PCA using semidefinite programming , 2010, 1012.1086.

[31]  Nathan Srebro,et al.  Stochastic optimization for PCA and PLS , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[33]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[34]  René Vidal,et al.  Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds , 2009, CVPR.

[35]  Zuowei Shen,et al.  Robust Video Restoration by Joint Sparse and Low Rank Matrix Approximation , 2011, SIAM J. Imaging Sci..

[36]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[37]  Søren Hauberg,et al.  Grassmann Averages for Scalable Robust PCA , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[39]  Fangpo Wang,et al.  Space and Space-Time Modeling of Directional Data , 2013 .

[40]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  R. Vidal,et al.  Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Yui Man Lui,et al.  Advances in matrix manifolds for computer vision , 2012, Image Vis. Comput..

[43]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[44]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[45]  Nicholas I. Fisher,et al.  Statistical Analysis of Spherical Data. , 1987 .

[46]  Hossein Hassani,et al.  On the Folded Normal Distribution , 2014, 1402.3559.

[47]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[48]  Nicholas I. Fisher,et al.  Statistical Analysis of Spherical Data. , 1987 .

[49]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[50]  N. Campbell Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation , 1980 .

[51]  M. Bridson,et al.  Metric Spaces of Non-Positive Curvature , 1999 .

[52]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[53]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Alan L. Yuille,et al.  Robust principal component analysis by self-organizing rules based on statistical physics approach , 1995, IEEE Trans. Neural Networks.