Abstract:In this paper, we study the convergence properties of the Stochastic Gradient Descent (SGD) method for finding a stationary point of a given objective function $J(\cdot)$. The objective function is not required to be convex. Rather, our results apply to a class of ``invex'' functions, which have the property that every stationary point is also a global minimizer. First, it is assumed that $J(\cdot)$ satisfies a property that is slightly weaker than the Kurdyka-Lojasiewicz (KL) condition, denoted here as (KL'). It is shown that the iterations $J(\boldsymbol{\theta}_t)$ converge almost surely to the global minimum of $J(\cdot)$. Next, the hypothesis on $J(\cdot)$ is strengthened from (KL') to the Polyak-Lojasiewicz (PL) condition. With this stronger hypothesis, we derive estimates on the rate of convergence of $J(\boldsymbol{\theta}_t)$ to its limit. Using these results, we show that for functions satisfying the PL property, the convergence rate of both the objective function and the norm of the gradient with SGD is the same as the best-possible rate for convex functions. While some results along these lines have been published in the past, our contributions contain two distinct improvements. First, the assumptions on the stochastic gradient are more general than elsewhere, and second, our convergence is almost sure, and not in expectation. We also study SGD when only function evaluations are permitted. In this setting, we determine the ``optimal'' increments or the size of the perturbations. Using the same set of ideas, we establish the global convergence of the Stochastic Approximation (SA) algorithm under more general assumptions on the measurement error, compared to the existing literature. We also derive bounds on the rate of convergence of the SA algorithm under appropriate assumptions.

Concentration bounds for two time scale stochastic approximation

Functional Central Limit Theorem for Two Timescale Stochastic Approximation

Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

Exponential Concentration in Stochastic Approximation

Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way

Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise

Concentration estimates for slowly time-dependent singular SPDEs on the two-dimensional torus

Concentration inequalities for locally small increments of compound empirical processes with applications to solutions of compound and risk averse stochastical programming

Finite-Time Decoupled Convergence in Nonlinear Two-Time-Scale Stochastic Approximation

Concentration bounds for stochastic systems with singular kernels

Theoretical analysis of a finite-volume scheme for a stochastic Allen-Cahn problem with constraint

Markovian Foundations for Quasi-Stochastic Approximation in Two Timescales: Extended Version

Temporal approximation of stochastic evolution equations with irregular nonlinearities

Remarks on Differential Inclusion limits of Stochastic Approximation

Central Limit Theorem for Two-Timescale Stochastic Approximation with Markovian Noise: Theory and Applications

Concentration of the Langevin Algorithm's Stationary Distribution

Second order concentration via logarithmic Sobolev inequalities

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Strong convergence rates for full-discrete approximations of the stochastic Allen-Cahn equations on 2D torus

On Asymptotic Preserving schemes for a class of Stochastic Differential Equations in averaging and diffusion approximation regimes

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications