Uniform bounds for robust mean estimators

Stanislav Minsker
DOI: https://doi.org/10.48550/arXiv.1812.03523
2019-05-04
Abstract:This paper is devoted to the estimators of the mean that provide strong non-asymptotic guarantees under minimal assumptions on the underlying distribution. The main ideas behind proposed techniques are based on bridging the notions of symmetry and robustness. We show that existing methods, such as median-of-means and Catoni's estimators, can often be viewed as special cases of our construction. The main contribution of the paper is the proof of uniform bounds for the deviations of the stochastic process defined by proposed estimators. Moreover, we extend our results to the case of adversarial contamination where a constant fraction of the observations is arbitrarily corrupted. Finally, we apply our methods to the problem of robust multivariate mean estimation and show that obtained inequalities achieve optimal dependence on the proportion of corrupted samples.
Statistics Theory,Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to simultaneously estimate the means of multiple functions under heavy - tailed distribution data, provide strong non - asymptotic guarantees, and achieve this goal under minimal assumptions. Specifically, the paper focuses on constructing a robust mean estimator that can exhibit good concentration when the data has heavy - tailed characteristics and provide tight bias bounds under minimal moment assumptions. The main contribution of the paper lies in proving the uniform bound of the bias of the stochastic process defined by the proposed estimator. In addition, the author also extends the results to the case of adversarial contamination, that is, the situation where a part of the observations are arbitrarily tampered with, and shows the optimal performance bounds of the random vector mean estimator obtained using these methods in this case. In short, the paper aims to develop a new robust mean estimation method that can provide reliable estimation results when the data is affected by heavy - tailed distribution or adversarial contamination.