Abstract:It is known that the standard stochastic gradient descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam optimizer fail to converge if the learning rates do not converge to zero (as, for example, in the situation of constant learning rates). Numerical simulations often use human-tuned deterministic learning rate schedules or small constant learning rates. The default learning rate schedules for SGD optimization methods in machine learning implementation frameworks such as TensorFlow and Pytorch are constant learning rates. In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates for the values of the objective function of the considered optimization problem (the function that one intends to minimize). In particular, we propose a learning-rate-adaptive variant of the Adam optimizer and implement it in case of several neural network learning problems, particularly, in the context of deep learning approximation methods for partial differential equations such as deep Kolmogorov methods, physics-informed neural networks, and deep Ritz methods. In each of the presented learning problems the proposed learning-rate-adaptive variant of the Adam optimizer faster reduces the value of the objective function than the Adam optimizer with the default learning rate. For a simple class of quadratic minimization problems we also rigorously prove that a learning-rate-adaptive variant of the SGD optimization method converges to the minimizer of the considered minimization problem. Our convergence proof is based on an analysis of the laws of invariant measures of the SGD method as well as on a more general convergence analysis for SGD with random but predictable learning rates which we develop in this work.

Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates

Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Stagewise Accelerated Stochastic Gradient Methods for Nonconvex Optimization

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes

Linear Convergence of Adaptive Stochastic Gradient Descent

Novel Convergence Results of Adaptive Stochastic Gradient Descents

Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices

Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

Accelerate Stochastic Subgradient Method by Leveraging Local Growth Condition

Accelerated Stochastic Subgradient Methods under Local Error Bound Condition

Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions

Convergence analysis of an accelerated stochastic admm with larger stepsizes

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

Faster Convergence of Local SGD for Over-Parameterized Models