Abstract:Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control --- measured entrywise and by the spectral norm --- which might be of independent interest.

Early Stopping of Untrained Convolutional Neural Networks

An Effective Training Method for Deep Convolutional Neural Network

One-Step Early Stopping Strategy using Neural Tangent Kernel Theory and Rademacher Complexity

Regularization of Inverse Problems by Neural Networks

Conformal inference is (almost) free for neural networks trained with early stopping

Early stopping by correlating online indicators in neural networks

Convergence rates for critical point regularization

Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

Big in Japan: Regularizing networks for solving inverse problems

A Neural-Network-Based Convex Regularizer for Inverse Problems

Effective Early Stopping of Point Cloud Neural Networks

On Regularization via Early Stopping for Least Squares Regression

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

An Unconstrained Layer-Peeled Perspective on Neural Collapse

Consistency of Neural Networks with Regularization

Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators

Over-parametrized neural networks as under-determined linear systems

Implicit Sparse Regularization: The Impact of Depth and Early Stopping

On the Regularizing Property of Stochastic Gradient Descent

Regularization for convolutional kernel tensors to avoid unstable gradient problem in convolutional neural networks

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization