Abstract:Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control --- measured entrywise and by the spectral norm --- which might be of independent interest.

Understanding Implicit Regularization in Over-Parameterized Single Index Model

High-Dimensional Linear Regression via Implicit Regularization

Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing

Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization

Linear Convergence of Inexact Descent Method and Inexact Proximal Gradient Algorithms for Lower-Order Regularization Problems

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

Implicit Regularization in Deep Matrix Factorization

Non-convex Penalized Estimation in High-Dimensional Models with Single-Index Structure

Robust Regularized Low-Rank Matrix Models for Regression and Classification

Robust Recovery Via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization

Robust Implicit Regularization via Weight Normalization

Sparse Parameter Identification for Stochastic Systems Based on [math] Regularization

Nonlinear generalization of the monotone single index model

Least Squares Regression Can Exhibit Under-Parameterized Double Descent

Uniformly valid inference for partially linear high-dimensional single-index models

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Functional linear and single-index models: A unified approach via Gaussian Stein identity

Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization