Abstract:In this paper, we study the convergence properties of the Stochastic Gradient Descent (SGD) method for finding a stationary point of a given objective function $J(\cdot)$. The objective function is not required to be convex. Rather, our results apply to a class of ``invex'' functions, which have the property that every stationary point is also a global minimizer. First, it is assumed that $J(\cdot)$ satisfies a property that is slightly weaker than the Kurdyka-Lojasiewicz (KL) condition, denoted here as (KL'). It is shown that the iterations $J(\boldsymbol{\theta}_t)$ converge almost surely to the global minimum of $J(\cdot)$. Next, the hypothesis on $J(\cdot)$ is strengthened from (KL') to the Polyak-Lojasiewicz (PL) condition. With this stronger hypothesis, we derive estimates on the rate of convergence of $J(\boldsymbol{\theta}_t)$ to its limit. Using these results, we show that for functions satisfying the PL property, the convergence rate of both the objective function and the norm of the gradient with SGD is the same as the best-possible rate for convex functions. While some results along these lines have been published in the past, our contributions contain two distinct improvements. First, the assumptions on the stochastic gradient are more general than elsewhere, and second, our convergence is almost sure, and not in expectation. We also study SGD when only function evaluations are permitted. In this setting, we determine the ``optimal'' increments or the size of the perturbations. Using the same set of ideas, we establish the global convergence of the Stochastic Approximation (SA) algorithm under more general assumptions on the measurement error, compared to the existing literature. We also derive bounds on the rate of convergence of the SA algorithm under appropriate assumptions.

Subgradient Selection Convergence Implies Uniform Subdifferential Set Convergence: And Other Tight Convergences Rates in Stochastic Convex Composite Minimization

High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Linear Convergence of Subgradient Algorithm for Convex Feasibility on Riemannian Manifolds

A Unified Analysis for the Subgradient Methods Minimizing Composite Nonconvex, Nonsmooth and Non-Lipschitz Functions.

Stochastic subgradient for composite optimization with functional constraints

Subgradient sampling for nonsmooth nonconvex minimization

On Almost Sure Convergence Rates of Stochastic Gradient Methods

Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity

On Almost Sure Convergence Rates of Stochastic Gradient Methods.

Convergence rate analysis of distributed optimization with projected subgradient algorithm.

Two-norm discrepancy and convergence of the stochastic gradient method with application to shape optimization

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Stochastic Subgradient Methods with Guaranteed Global Stability in Nonsmooth Nonconvex Optimization

On Convergence Rate of Distributed Stochastic Gradient Algorithm for Convex Optimization with Inequality Constraints.

Decentralized Stochastic Subgradient Methods for Nonsmooth Nonconvex Optimization

Stochastic Approximation of Smooth and Strongly Convex Functions: Beyond the Convergence Rate

Almost Sure Convergence Rates Analysis and Saddle Avoidance of Stochastic Gradient Methods

Understanding the unstable convergence of gradient descent.

Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications