A Proof of Exact Convergence Rate of Gradient Descent. Part II. Performance Criterion $(f(x_N)-f_*)/\|x_0-x_*\|^2$

Jungbin Kim
2024-12-06
Abstract:We prove the exact worst-case convergence rate of gradient descent for smooth strongly convex optimization, with respect to the performance criterion $(f(x_N)-f_*)/\Vert x_0-x_*\Vert^2$. This rate was previously conjectured in [DT14; THG17].
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to prove the exact worst - case convergence rate of the gradient descent method in smooth and strongly convex optimization problems, specifically for the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$. This convergence rate was previously conjectured in the literature [DT14; THG17]. ### Specific Problem Description Consider the following optimization problem: \[ x^* \in \arg\min_{x \in \mathbb{R}^d} f(x), \] where $f$ is an $L$-smooth and $\mu$-strongly convex function (i.e., $f \in F_{\mu, L}$). For the gradient descent method (GD): \[ x_{k + 1}=x_k-\gamma \nabla f(x_k), \] given the number of iterations $N$, the convergence rate or worst - case performance of the gradient descent method is defined as: \[ \sup \left\{ \frac{f(x_N)-f(x^*)}{\|x_0 - x^*\|^2}:f \in F_{\mu, L}, x_0 \in \mathbb{R}^d, x_k \text{ is generated by GD} \right\}. \] ### Research Background and Objectives Based on numerical evidence, Drori and Teboulle conjectured this convergence rate in the case of $\mu = 0$, and Taylor et al. also proposed a similar conjecture in the case of $\mu>0$. Specifically, they believe that the iterations of the gradient descent method satisfy: \[ f(x_N)-f(x^*) \leq \max \left\{ \frac{\kappa}{\kappa - 1}+(1 - \gamma \mu)^{-2N},(1 - \gamma L)^{2N} \right\} \frac{L}{2}\|x_0 - x^*\|^2, \] where $\kappa=\frac{\mu}{L}$. Moreover, this convergence guarantee is tight. ### Main Contributions of the Paper This paper verifies the correctness of this conjecture by proving the upper and lower bounds of the above inequality. The authors use a new method, that is, establishing a correspondence between the convergence analysis for the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$ and the convergence analysis for another performance metric $\frac{\|\nabla f(x_N)\|^2}{f(x_0)-f^*}$. This method not only simplifies the proof process but also provides a complete theoretical framework for understanding the relationship between these two performance metrics. ### Conclusions By proving the above conjecture, this paper determines the exact worst - case convergence rate of the gradient descent method in smooth and strongly convex optimization problems with respect to the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$. This is not only an important supplement to the classical first - order convex optimization theory but also provides new perspectives and tools for future research.