Abstract:High-dimensional linear regression under heavy-tailed noise or outlier corruption is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since the robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a projected sub-gradient descent algorithm for both the sparse linear regression and low-rank linear regression problems. The algorithm is not only computationally efficient with linear convergence but also statistically optimal, be the noise Gaussian or heavy-tailed with a finite 1 + epsilon moment. The convergence theory is established for a general framework and its specific applications to absolute loss, Huber loss and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of two-phase convergence. In phase one, the algorithm behaves as in typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator, which is already observed in the existing literature. Interestingly, during phase two, the algorithm converges linearly as if minimizing a smooth and strongly convex objective function, and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Numerical simulations confirm our theoretical discovery and showcase the superiority of our algorithm over prior methods.

On the Robustness and Generalization of Cauchy Regression

Cauchy Loss Function: Robustness Under Gaussian and Cauchy Noise

On Regression in Extreme Regions

Recursive Least Squares for Censored Regression

Censored Regression with Noisy Input

Query Complexity of Least Absolute Deviation Regression via Robust Uniform Convergence

A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization.

Distributional Robustness Bounds Generalization Errors

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression

A Novel Framework for Improving the Breakdown Point of Robust Regression Algorithms

Universal Robust Regression via Maximum Mean Discrepancy

A Robust Learning Algorithm for Regression Models Using Distributionally Robust Optimization under the Wasserstein Metric

Large-Scale Methods for Distributionally Robust Optimization

Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression

On the KL-Divergence-based Robust Satisficing Model

A robust adaptive linear regression method for severe noise

Learning with the Maximum Correntropy Criterion Induced Losses for Regression.

The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent.

A Statistical Theory of Regularization-Based Continual Learning

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization