Abstract:We prove the exact worst-case convergence rate of gradient descent for smooth strongly convex optimization, with respect to the performance criterion $(f(x_N)-f_*)/\Vert x_0-x_*\Vert^2$. This rate was previously conjectured in [DT14; THG17].
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to prove the exact worst - case convergence rate of the gradient descent method in smooth and strongly convex optimization problems, specifically for the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$. This convergence rate was previously conjectured in the literature [DT14; THG17].
### Specific Problem Description
Consider the following optimization problem:
\[ x^* \in \arg\min_{x \in \mathbb{R}^d} f(x), \]
where $f$ is an $L$-smooth and $\mu$-strongly convex function (i.e., $f \in F_{\mu, L}$). For the gradient descent method (GD):
\[ x_{k + 1}=x_k-\gamma \nabla f(x_k), \]
given the number of iterations $N$, the convergence rate or worst - case performance of the gradient descent method is defined as:
\[ \sup \left\{ \frac{f(x_N)-f(x^*)}{\|x_0 - x^*\|^2}:f \in F_{\mu, L}, x_0 \in \mathbb{R}^d, x_k \text{ is generated by GD} \right\}. \]
### Research Background and Objectives
Based on numerical evidence, Drori and Teboulle conjectured this convergence rate in the case of $\mu = 0$, and Taylor et al. also proposed a similar conjecture in the case of $\mu>0$. Specifically, they believe that the iterations of the gradient descent method satisfy:
\[ f(x_N)-f(x^*) \leq \max \left\{ \frac{\kappa}{\kappa - 1}+(1 - \gamma \mu)^{-2N},(1 - \gamma L)^{2N} \right\} \frac{L}{2}\|x_0 - x^*\|^2, \]
where $\kappa=\frac{\mu}{L}$. Moreover, this convergence guarantee is tight.
### Main Contributions of the Paper
This paper verifies the correctness of this conjecture by proving the upper and lower bounds of the above inequality. The authors use a new method, that is, establishing a correspondence between the convergence analysis for the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$ and the convergence analysis for another performance metric $\frac{\|\nabla f(x_N)\|^2}{f(x_0)-f^*}$. This method not only simplifies the proof process but also provides a complete theoretical framework for understanding the relationship between these two performance metrics.
### Conclusions
By proving the above conjecture, this paper determines the exact worst - case convergence rate of the gradient descent method in smooth and strongly convex optimization problems with respect to the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$. This is not only an important supplement to the classical first - order convex optimization theory but also provides new perspectives and tools for future research.