Abstract:In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradient descent (SGD)-like behavior despite solely relying on evaluations of the objective function. The fundamental value of such link between CBO and SGD lies in the fact that CBO is provably globally convergent to global minimizers for ample classes of nonsmooth and nonconvex objective functions, hence, on the one side, offering a novel explanation for the success of stochastic relaxations of gradient descent. On the other side, contrary to the conventional wisdom for which zero-order methods ought to be inefficient or not to possess generalization abilities, our results unveil an intrinsic gradient descent nature of such heuristics. This viewpoint furthermore complements previous insights into the working principles of CBO, which describe the dynamics in the mean-field limit through a nonlinear nonlocal partial differential equation that allows to alleviate complexities of the nonconvex function landscape. Our proofs leverage a completely nonsmooth analysis, which combines a novel quantitative version of the Laplace principle (log-sum-exp trick) and the minimizing movement scheme (proximal iteration). In doing so, we furnish useful and precise insights that explain how stochastic perturbations of gradient descent overcome energy barriers and reach deep levels of nonconvex functions. Instructive numerical illustrations support the provided theoretical insights.

Gradient Estimation Using Stochastic Computation Graphs

Stochastic Average Gradient : A Simple Empirical Investigation

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Gradient Estimation and Variance Reduction in Stochastic and Deterministic Models

Stochastic Gradient Descent as Approximate Bayesian Inference

Gradient is All You Need?

Gradient Estimators for Implicit Models

An Efficient High-Dimensional Gradient Estimator for Stochastic Differential Equations

A Large-Scale Stochastic Gradient Descent Algorithm over a Graphon

Convergence Analysis of Asynchronous Stochastic Recursive Gradient Algorithms

Graph Neural Stochastic Differential Equations

Gradient Descent for Spiking Neural Networks

Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation

Scalable Gradients for Stochastic Differential Equations

SGD with Clipping is Secretly Estimating the Median Gradient

Gradient Descent, Stochastic Optimization, and Other Tales

Tree-Projected Gradient Descent for Estimating Gradient-Sparse Parameters on Graphs

Gradient Estimation via Differentiable Metropolis-Hastings

Gradient Estimation with Discrete Stein Operators

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent