Abstract:Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.

Theoretical limits of descending $\ell_0$ sparse-regression ML algorithms

CONVERGENCE AND STABILITY OF ITERATIVELY REWEIGHTED LEAST SQUARES FOR LOW-RANK MATRIXRECOVERY

Distributed Sparse Recursive Least-Squares over Networks

Outlier-robust sparse/low-rank least-squares regression and robust matrix completion

Optimality of $\ell_2/\ell_1$-optimization block-length dependent thresholds

Information theoretic limits of learning a sparse rule

Simultaneous support recovery in high dimensions: Benefits and perils of block $\ell_1/\ell_\infty$-regularization

Sparse Linear Regression and Lattice Problems

Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

Fl RDT based ultimate lowering of the negative spherical perceptron capacity

L2/L2-foreach sparse recovery with low risk

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Improved Convergence for $\ell_\infty$ and $\ell_1$ Regression via Iteratively Reweighted Least Squares

Precise analysis of ridge interpolators under heavy correlations -- a Random Duality Theory view

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation

The phase diagram of compressed sensing with $\ell_0$-norm regularization

Linear Recursive Feature Machines provably recover low-rank matrices

Deep Learning Meets Sparse Regularization: A Signal Processing Perspective

Minimax Optimal rates of convergence in the shuffled regression, unlinked regression, and deconvolution under vanishing noise

Nonparametric regression using over-parameterized shallow ReLU neural networks

Quantized Low-Rank Multivariate Regression with Random Dithering