Abstract:High-dimensional linear regression under heavy-tailed noise or outlier corruption is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since the robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a projected sub-gradient descent algorithm for both the sparse linear regression and low-rank linear regression problems. The algorithm is not only computationally efficient with linear convergence but also statistically optimal, be the noise Gaussian or heavy-tailed with a finite 1 + epsilon moment. The convergence theory is established for a general framework and its specific applications to absolute loss, Huber loss and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of two-phase convergence. In phase one, the algorithm behaves as in typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator, which is already observed in the existing literature. Interestingly, during phase two, the algorithm converges linearly as if minimizing a smooth and strongly convex objective function, and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Numerical simulations confirm our theoretical discovery and showcase the superiority of our algorithm over prior methods.

Overfitting Reduction in Convex Regression

Faithful Variable Screening for High-Dimensional Convex Regression

An Aggressive Reduction on the Complexity of Optimization for Non-Strongly Convex Objectives

A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization

Convex Support Vector Regression

Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression

Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes

Convex Regression in Multidimensions: Suboptimality of Least Squares Estimators

Incorporating Linear Regression Problems into an Adaptive Framework with Feasible Optimizations.

A Function Fitting Method

Optimal convex $M$-estimation via score matching

Unveiling low-dimensional patterns induced by convex non-differentiable regularizers

Nonconvex Sparse Logistic Regression Via Proximal Gradient Descent.

Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity.

Regularized online exponentially concave optimization

Benign overfitting in linear regression

Optimal and parameter-free gradient minimization methods for convex and nonconvex optimization

Nonconvex Sparse Logistic Regression with Weakly Convex Regularization

Fast Rates for Contextual Linear Optimization

Improved Algorithms for Convex-Concave Minimax Optimization

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity