A Generalized Version of Chung's Lemma and its Applications

Li Jiang,Xiao Li,Andre Milzarek,Junwen Qiu
2024-06-09
Abstract:Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized Chung's lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as stochastic gradient descent and random reshuffling, under a general $(\theta,\mu)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes can adapt to the objective function's geometry, achieving the optimal convergence rate without requiring exact knowledge of the underlying landscape. Our results demonstrate that the developed variant of Chung's lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.
Optimization and Control,Machine Learning,Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide a simple and non - asymptotic convergence rate framework for stochastic optimization methods under more general step - size rules. Specifically, the author develops a generalized Chung's lemma, which can be used to analyze the non - asymptotic convergence rates of Stochastic Gradient Descent (SGD) and Random Reshuffling (RR) methods under polynomial, constant, exponential and cosine step - size rules. In particular, this study provides some new non - asymptotic complexity results for these methods under the general (θ, µ)-Polyak - /Lojasiewicz (PL) condition. ### Main Contributions 1. **Generalized Chung's Lemma**: The author derives a new generalized Chung's lemma to describe the growth behavior of the sequence $\{a_k\}$. This lemma decomposes the convergence rate into two parts: the T - induced rate and the S - induced rate. This enables the systematic establishment of non - asymptotic convergence rates under different step - size rules. 2. **Exponential and Cosine Step - Size Rules under (θ, µ)-PL Condition**: Based on the generalized Chung's lemma, the author provides non - asymptotic analyses of SGD and RR methods using exponential and cosine step - size rules under the (θ, µ)-PL condition. These analyses not only cover the existing complexity bounds but also improve the convergence rate of the cosine step - size rule and establish new results, especially in the case where the PL exponent θ ∈ (1/2, 1]. 3. **Polynomial and Constant Step - Size Rules**: For the sake of completeness, the author also provides non - asymptotic analyses of SGD and RR methods using polynomial and constant step - size rules under the (θ, µ)-PL condition. These results not only verify the conclusions in the existing literature but also introduce some new results, especially in the case where θ ∈ (1/2, 1]. ### Technical Challenges 1. **Complex Recursion under General (θ, µ)-PL Condition**: Under the (θ, µ)-PL condition, a more complex recursive relation needs to be analyzed: \[ y_{k + 1} \leq (1+\ell_1\alpha_k^\tau)y_k-\ell_2\alpha_ky_k^{2\theta}+\ell_3\alpha_k^\tau \] When θ ∈ (1/2, 1], the traditional non - asymptotic Chung's lemma is not applicable. The author simplifies this recursive relation by constructing an auxiliary sequence to make it conform to the form of the generalized Chung's lemma. 2. **Analysis of Exponential and Cosine Step - Size Rules**: For the exponential and cosine step - size rules, the assumptions required for applying the generalized Chung's lemma hold only in some of the iterations. The author introduces a new splitting technique, combining the generalized Chung's lemma and an extension lemma to solve this problem. ### Summary This paper provides a systematic method to establish the convergence rates and complexity bounds of various optimization methods under different step - size rules by developing the generalized Chung's lemma. This tool is applicable not only to polynomial and constant step - size rules but also especially to exponential and cosine step - size rules, thus providing a new perspective for the theoretical analysis of stochastic optimization methods.