Abstract:The convergence behavior of Stochastic Gradient Descent (SGD) crucially depends on the stepsize configuration. When using a constant stepsize, the SGD iterates form a Markov chain, enjoying fast convergence during the initial transient phase. However, when reaching stationarity, the iterates oscillate around the optimum without making further progress. In this paper, we study the convergence diagnostics for SGD with constant stepsize, aiming to develop an effective dynamic stepsize scheme. We propose a novel coupling-based convergence diagnostic procedure, which monitors the distance of two coupled SGD iterates for stationarity detection. Our diagnostic statistic is simple and is shown to track the transition from transience stationarity theoretically. We conduct extensive numerical experiments and compare our method against various existing approaches. Our proposed coupling-based stepsize scheme is observed to achieve superior performance across a diverse set of convex and non-convex problems. Moreover, our results demonstrate the robustness of our approach to a wide range of hyperparameters.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problems of **convergence diagnosis and dynamic step - size adjustment schemes for Stochastic Gradient Descent (SGD) when using a fixed step - size**. Specifically, the author focuses on how to effectively detect whether SGD iterations have entered a stationary state, and on this basis, design a method that can adaptively adjust the step - size to achieve a faster convergence rate and higher accuracy. #### Main problem background 1. **Limitations of fixed - step - size SGD**: - When using a fixed step - size, SGD iterations form a Markov chain and can converge rapidly in the initial stage. - However, after reaching the stationary state, the iterations will randomly fluctuate near the optimal solution and cannot be further improved. - In this case, a larger step - size can accelerate the initial convergence but will lead to a larger approximation error; a smaller step - size will slow down the convergence rate. 2. **Deficiencies of existing methods**: - Traditional step - size decay strategies (such as \( \gamma_k=\frac{\gamma_0}{k} \) or \( \gamma_k = \frac{\gamma_0}{\sqrt{k}} \)) have non - asymptotic convergence guarantees, but are not robust enough to ill - conditioned conditions in practical applications and have a slow convergence rate. - Fixed - step - size SGD is widely used in practice because it has a fast convergence rate and is easy to adjust parameters. However, there is a lack of effective stationarity detection methods to adjust the step - size in a timely manner. 3. **Importance of stationarity detection**: - Accurately detecting that SGD iterations have entered a stationary state is crucial so that the step - size can be reduced at an appropriate time and continued optimization can be carried out to obtain better accuracy. - Current stationarity detection methods (such as Pflug's method, distance - based methods, etc.) are either ineffective or lack theoretical support. #### Main contributions of the paper 1. **Proposed a coupling - based convergence diagnosis method**: - By maintaining two SGD iteration sequences that use the same step - size and data points but different initializations, monitor the change in the distance between them. - When the distance ratio between the two sequences is less than a certain threshold, it is considered that the iteration has entered a stationary state, thus triggering a step - size reduction. 2. **Developed an effective dynamic step - size adjustment scheme**: - Combining the above - mentioned diagnosis method, a simple and easy - to - implement dynamic step - size adjustment algorithm is proposed. - The effectiveness of this diagnostic statistic in general convex problems is proved, and its superior performance is verified through experiments. 3. **Extensive experimental verification**: - A large number of experiments were carried out on multiple convex and non - convex problems, including logistic regression, least - squares regression, ResNet - 18, etc. - The experimental results show that this method not only performs excellently in various tasks but also has strong robustness to hyper - parameters. In summary, this paper solves the problem that it is difficult to effectively adjust the step - size in practical applications of fixed - step - size SGD by introducing a coupling - based convergence diagnosis method, providing new ideas and tools for improving the convergence rate and accuracy of SGD.

Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent

Convergence of Markov Chains for Constant Step-size Stochastic Gradient Descent with Separable Functions

Convergence and concentration properties of constant step-size SGD through Markov chains

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators

Demystifying the Myths and Legends of Nonconvex Convergence of SGD

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Asynchronous Accelerated Stochastic Gradient Descent.

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

On the Diffusion Approximation of Nonconvex Stochastic Gradient Descent

On the Convergence and Improvement of Stochastic Normalized Gradient Descent

The Anytime Convergence of Stochastic Gradient Descent with Momentum: From a Continuous-Time Perspective

Understanding the unstable convergence of gradient descent.

Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent

Accelerated Gradient Descent by Concatenation of Stepsize Schedules

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications

On Faster Convergence of Scaled Sign Gradient Descent