Abstract:Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.

Gradient Descent for Robust Kernel-Based Regression

Robust Regularized Kernel Regression.

Optimality of Robust Online Learning

Robust empirical risk minimization via Newton's method

Adaptive Stochastic Gradient Descent on the Grassmannian for Robust Low-Rank Subspace Recovery

Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression

Statistical Robustness of Kernel Learning Estimator with Respect to Data Perturbation

Solving Kernel Ridge Regression with Gradient-Based Optimization Methods

Robust Non-linear Regression: A Greedy Approach Employing Kernels with Application to Image Denoising

Early stopping and polynomial smoothing in regression with reproducing kernels

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation

Estimating Generalization Performance Along the Trajectory of Proximal SGD in Robust Regression

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix Estimation

Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Statistical Robustness of Empirical Risks in Machine Learning

Solving Kernel Ridge Regression with Gradient Descent for a Non-Constant Kernel

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks.

On the Estimation of Derivatives Using Plug-in Kernel Ridge Regression Estimators

Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration

Optimal Rates for Coefficient-Based Regularized Regression

Convergence of Unregularized Online Learning Algorithms