Distribution-Free Robust Linear Regression
Jaouad Mourtada,Tomas Vaškevičius,Nikita Zhivotovskiy
DOI: https://doi.org/10.4171/MSL/27
2021-02-25
Abstract:We study random design linear regression with no assumptions on the
distribution of the covariates and with a heavy-tailed response variable. In
this distribution-free regression setting, we show that boundedness of the
conditional second moment of the response given the covariates is a necessary
and sufficient condition for achieving nontrivial guarantees. As a starting
point, we prove an optimal version of the classical in-expectation bound for
the truncated least squares estimator due to Gy\"{o}rfi, Kohler, Krzy\.{z}ak,
and Walk. However, we show that this procedure fails with constant probability
for some distributions despite its optimal in-expectation performance. Then,
combining the ideas of truncated least squares, median-of-means procedures, and
aggregation theory, we construct a non-linear estimator achieving excess risk
of order $d/n$ with an optimal sub-exponential tail. While existing approaches
to linear regression for heavy-tailed distributions focus on proper estimators
that return linear functions, we highlight that the improperness of our
procedure is necessary for attaining nontrivial guarantees in the
distribution-free setting.
Machine Learning,Statistics Theory