Outlier Robust Online Learning

We consider the problem of learning from noisy data in practical settings where the size of data is too large to store on a single machine. More challenging, the data coming from the wild may contain malicious outliers. To address the scalability and robustness issues, we present an online robust learning (ORL) approach. ORL is simple to implement and has provable robustness guarantee -- in stark contrast to existing online learning approaches that are generally fragile to outliers. We specialize the ORL approach for two concrete cases: online robust principal component analysis and online linear regression. We demonstrate the efficiency and robustness advantages of ORL through comprehensive simulations and predicting image tags on a large-scale data set. We also discuss extension of the ORL to distributed learning and provide experimental evaluations.

[1]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[2]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[3]  D. Donoho,et al.  Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[4]  L. Bottou Learning and Stochastic Approximations 3 Q ( z , w ) measures the economical cost ( in hard currency units ) of delivering , 2012 .

[5]  V. Yohai,et al.  Robust Estimation of Multivariate Location and Scatter , 2006 .

[6]  Constantine Caramanis,et al.  Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery , 2013, ICML.

[7]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  P. Zitt,et al.  Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm , 2011, 1101.4316.

[10]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[11]  M. Lerasle,et al.  ROBUST EMPIRICAL MEAN ESTIMATORS , 2011, 1112.3914.

[12]  A. Banerjee Convex Analysis and Optimization , 2006 .

[13]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[14]  Daniel J. Hsu,et al.  Loss Minimization and Parameter Estimation with Heavy Tails , 2013, J. Mach. Learn. Res..

[15]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[16]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[17]  Shie Mannor,et al.  Outlier-Robust PCA: The High-Dimensional Case , 2013, IEEE Transactions on Information Theory.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Zhigang Luo,et al.  Online Nonnegative Matrix Factorization With Robust Stochastic Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[21]  Shie Mannor,et al.  Online PCA for Contaminated Data , 2013, NIPS.

[22]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[23]  Shuicheng Yan,et al.  Online Robust PCA via Stochastic Optimization , 2013, NIPS.