Abstract:There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distributions with finite but unknown variance, and 2) the analysis of the median-of-means algorithm by [BCL13] and a lower bound by [DLLO16], characterizing the big-O optimal errors for distributions for which only a $1+\alpha$ moment exists for $\alpha \in (0,1)$. Both results, however, are optimal only in the worst case. We initiate the fine-grained study of the mean estimation problem: Can algorithms leverage useful features of the input distribution to beat the sub-Gaussian rate, without explicit knowledge of such features? We resolve this question with an unexpectedly nuanced answer: "Yes in limited regimes, but in general no". For any distribution $p$ with a finite mean, we construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$ and $q$ are not distinguishable with high probability, and $q$ further preserves $p$'s moments up to constants. The main consequence is that no reasonable estimator can asymptotically achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22]. More generally, we introduce a new definitional framework to analyze the fine-grained optimality of algorithms, which we call "neighborhood optimality", interpolating between the unattainably strong "instance optimality" and the trivially weak "admissibility" definitions. Applying the new framework, we show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-optimal estimator without constant factor slackness.

Uniform bounds for robust mean estimators

Universal Robust Regression via Maximum Mean Discrepancy

Statistical Barriers to Affine-equivariant Estimation

Outlier-robust Mean Estimation near the Breakdown Point via Sum-of-Squares

Robust Mean Estimation Without Moments for Symmetric Distributions

Factors in a chloroplast extract specifically bind to the 5' untranslated regions of chloroplast mRNAs.

Beyond Catoni: Sharper Rates for Heavy-Tailed and Robust Mean Estimation

Online Robust Mean Estimation

Information Lower Bounds for Robust Mean Estimation

Optimal Robust Estimation under Local and Global Corruptions: Stronger Adversary and Smaller Error

Robust Sparse Mean Estimation via Sum of Squares

Robust estimations from distribution structures: I. Mean

Robust Estimation under the Wasserstein Distance

Robust estimation in finite population sampling

A Bilateral Bound on the Mean-Square Error for Estimation in Model Mismatch

Error bounds of Median-of-means estimators with VC-dimension

Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+α$ Moments

Mean Estimation in Banach Spaces Under Infinite Variance and Martingale Dependence

The Geometric Median and Applications to Robust Mean Estimation

A Note on the Consistency of a Robust Estimator for Threshold Autoregressive Processes

A Sub-Quadratic Time Algorithm for Robust Sparse Mean Estimation