Optimal Mean Estimation without a Variance

We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. Concretely, given a sample $\mathbf{X} = \{X_i\}_{i = 1}^n$ from a distribution $\mathcal{D}$ over $\mathbb{R}^d$ with mean $\mu$ which satisfies the following \emph{weak-moment} assumption for some ${\alpha \in [0, 1]}$: \begin{equation*} \forall \|v\| = 1: \mathbb{E}_{X \thicksim \mathcal{D}}[\lvert \langle X - \mu, v\rangle \rvert^{1 + \alpha}] \leq 1, \end{equation*} and given a target failure probability, $\delta$, our goal is to design an estimator which attains the smallest possible confidence interval as a function of $n,d,\delta$. For the specific case of $\alpha = 1$, foundational work of Lugosi and Mendelson exhibits an estimator achieving subgaussian confidence intervals, and subsequent work has led to computationally efficient versions of this estimator. Here, we study the case of general $\alpha$, and establish the following information-theoretic lower bound on the optimal attainable confidence interval: \begin{equation*} \Omega \left(\sqrt{\frac{d}{n}} + \left(\frac{d}{n}\right)^{\frac{\alpha}{(1 + \alpha)}} + \left(\frac{\log 1 / \delta}{n}\right)^{\frac{\alpha}{(1 + \alpha)}}\right). \end{equation*} Moreover, we devise a computationally-efficient estimator which achieves this lower bound.

[1]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[2]  W. R. Buckland,et al.  Contributions to Probability and Statistics , 1960 .

[3]  J. Tukey A survey of sampling from contaminated distributions , 1960 .

[4]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[5]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[6]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[7]  Editors , 1986, Brain Research Bulletin.

[8]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[9]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[10]  B. M. Fulk MATH , 1992 .

[11]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[12]  Y. Nesterov Semidefinite relaxation and nonconvex quadratic optimization , 1998 .

[13]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[14]  L. M. Albright Vectors , 2003, Current protocols in molecular biology.

[15]  Ericka Stricklin-Parker,et al.  Ann , 2005 .

[16]  M. Bagirov Optimization Methods and Software , 2005 .

[17]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[18]  M. Panella Associate Editor of the Journal of Computer and System Sciences , 2014 .

[19]  G. Nemhauser,et al.  Wiley‐Interscience Series in Discrete Mathematics and Optimization , 2014 .

[20]  G. Lugosi,et al.  Sub-Gaussian mean estimators , 2015, 1509.05845.

[21]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[23]  G. Lugosi,et al.  Risk minimization by median-of-means tournaments , 2016, Journal of the European Mathematical Society.

[24]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[25]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[26]  O. Catoni,et al.  Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression , 2017, 1712.02747.

[27]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[28]  Samuel B. Hopkins Mean estimation with sub-Gaussian rates in polynomial time , 2018, The Annals of Statistics.

[29]  Gregory Valiant,et al.  Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers , 2017, ITCS.

[30]  Daniel M. Kane,et al.  List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[31]  Pravesh Kothari,et al.  Robust moment estimation and improved clustering via sum of squares , 2018, STOC.

[32]  Daniel M. Kane,et al.  Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[33]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[34]  Zhixian Lei,et al.  A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates , 2019, COLT 2019.

[35]  G. Lugosi,et al.  Robust multivariate mean estimation: The optimality of trimmed mean , 2019, The Annals of Statistics.

[36]  Yu Cheng,et al.  High-Dimensional Robust Mean Estimation in Nearly-Linear Time , 2018, SODA.

[37]  Shahar Mendelson,et al.  Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey , 2019, Found. Comput. Math..

[38]  G. Lugosi,et al.  Sub-Gaussian estimators of the mean of a random vector , 2017, The Annals of Statistics.

[39]  Peter L. Bartlett,et al.  Fast Mean Estimation with Sub-Gaussian Rates , 2019, COLT.

[40]  Samuel B. Hopkins,et al.  Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection , 2019, NeurIPS.

[41]  Jules Depersin,et al.  A spectral algorithm for robust regression with subgaussian rates , 2020, ArXiv.

[42]  Jules Depersin,et al.  Robust subgaussian estimation with VC-dimension , 2020, ArXiv.

[43]  Prasad Raghavendra,et al.  Algorithms for heavy-tailed statistics: regression, covariance estimation, and beyond , 2019, STOC.

[44]  S. Mendelson,et al.  Robust covariance estimation under $L_{4}-L_{2}$ norm equivalence , 2018, The Annals of Statistics.

[45]  G. Lecu'e,et al.  Robust sub-Gaussian estimation of a mean vector in nearly linear time , 2019, The Annals of Statistics.