What problem does this paper attempt to address?

The problem that this paper attempts to solve is to prove the exact worst - case convergence rate of the gradient descent method in smooth and strongly convex optimization problems, specifically for the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$. This convergence rate was previously conjectured in the literature [DT14; THG17]. ### Specific Problem Description Consider the following optimization problem: \[ x^* \in \arg\min_{x \in \mathbb{R}^d} f(x), \] where $f$ is an $L$-smooth and $\mu$-strongly convex function (i.e., $f \in F_{\mu, L}$). For the gradient descent method (GD): \[ x_{k + 1}=x_k-\gamma \nabla f(x_k), \] given the number of iterations $N$, the convergence rate or worst - case performance of the gradient descent method is defined as: \[ \sup \left\{ \frac{f(x_N)-f(x^*)}{\|x_0 - x^*\|^2}:f \in F_{\mu, L}, x_0 \in \mathbb{R}^d, x_k \text{ is generated by GD} \right\}. \] ### Research Background and Objectives Based on numerical evidence, Drori and Teboulle conjectured this convergence rate in the case of $\mu = 0$, and Taylor et al. also proposed a similar conjecture in the case of $\mu>0$. Specifically, they believe that the iterations of the gradient descent method satisfy: \[ f(x_N)-f(x^*) \leq \max \left\{ \frac{\kappa}{\kappa - 1}+(1 - \gamma \mu)^{-2N},(1 - \gamma L)^{2N} \right\} \frac{L}{2}\|x_0 - x^*\|^2, \] where $\kappa=\frac{\mu}{L}$. Moreover, this convergence guarantee is tight. ### Main Contributions of the Paper This paper verifies the correctness of this conjecture by proving the upper and lower bounds of the above inequality. The authors use a new method, that is, establishing a correspondence between the convergence analysis for the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$ and the convergence analysis for another performance metric $\frac{\|\nabla f(x_N)\|^2}{f(x_0)-f^*}$. This method not only simplifies the proof process but also provides a complete theoretical framework for understanding the relationship between these two performance metrics. ### Conclusions By proving the above conjecture, this paper determines the exact worst - case convergence rate of the gradient descent method in smooth and strongly convex optimization problems with respect to the performance metric $\frac{f(x_N)-f^*}{\|x_0 - x^*\|^2}$. This is not only an important supplement to the classical first - order convex optimization theory but also provides new perspectives and tools for future research.

A Proof of Exact Convergence Rate of Gradient Descent. Part II. Performance Criterion $(f(x_N)-f_)/\|x_0-x_\|^2$

A Proof of Exact Convergence Rate of Gradient Descent. Part I. Performance Criterion $\Vert \nabla f(x_N)\Vert^2/(f(x_0)-f_*)$

Exact worst-case convergence rates of gradient descent: a complete analysis for all constant stepsizes over nonconvex and convex functions

On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions

Exact convergence rate of the last iterate in subgradient methods

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Inexact Riemannian Gradient Descent Method for Nonconvex Optimization

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications

Provably Faster Gradient Descent via Long Steps

The Average Rate of Convergence of the Exact Line Search Gradient Descent Method

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Derivatives of Stochastic Gradient Descent in parametric optimization

A new convergence rate of the steepest descent regarding the Euclidean norm

Accelerated Objective Gap and Gradient Norm Convergence for Gradient Descent via Long Steps

High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Concavifiability and convergence: necessary and sufficient conditions for gradient descent analysis

Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates

Exact Linear Convergence Rate Analysis for Low-Rank Symmetric Matrix Completion via Gradient Descent

A Proof of Exact Convergence Rate of Gradient Descent. Part II. Performance Criterion $(f(x_N)-f_*)/\|x_0-x_*\|^2$