A Proof of Exact Convergence Rate of Gradient Descent. Part I. Performance Criterion $\Vert \nabla f(x_N)\Vert^2/(f(x_0)-f_*)$

Jungbin Kim
2024-12-06
Abstract:We prove the exact worst-case convergence rate of gradient descent for smooth strongly convex optimization, with respect to the performance criterion $\Vert \nabla f(x_N)\Vert^2/(f(x_0)-f_*)$. The proof differs from the previous one by Rotaru \emph{et al.} [RGP24], and is based on the performance estimation methodology [DT14].
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to prove the exact worst - case convergence rate of the Gradient Descent (GD) method in smooth and strongly convex optimization problems, specifically for the performance metric $\frac{\|\nabla f(x_N)\|_2^2}{f(x_0) - f^*}$. Here, $f$ is an $L$-smooth and $\mu$-strongly convex function, $x_N$ is the point after $N$ iterations, $x_0$ is the initial point, and $f^*$ is the optimal value. ### Detailed Explanation 1. **Research Background**: - The paper focuses on the convergence analysis of the Gradient Descent method in optimization problems. - The optimization problem is: $\min_{x \in \mathbb{R}^d} f(x)$, where $f$ belongs to $F_{\mu, L}$, that is, the space of $L$-smooth and $\mu$-strongly convex functions. - The update formula of the Gradient Descent method is: $x_{k + 1}=x_k-\gamma \nabla f(x_k)$, where $\gamma \in(0,2 / L)$ is the fixed step size. 2. **Performance Metric**: - The performance metric is defined as: $\sup \left\{\frac{\|\nabla f(x_N)\|_2^2}{f(x_0)-f^*}: f \in F_{\mu, L}, x_0 \in \mathbb{R}^d, x_k \text{ generated by GD}\right\}$. - This metric measures the convergence speed of the Gradient Descent method in the worst - case scenario. 3. **Previous Work**: - Rotaru et al. in [RGP24] have already proven the exact value of this performance metric and given a result: \[ \frac{1}{2L}\|\nabla f(x_N)\|_2^2 \leq \max \left\{\frac{\kappa}{\kappa - 1}+(1-\gamma\mu)^{-2N},(1 - \gamma L)^{2N}\right\}(f(x_0)-f^*) \] where $\kappa=\frac{\mu}{L}$. 4. **Contributions of This Paper**: - This paper provides a new proof method different from [RGP24], based on the performance estimation method [DT14]. - By choosing a specific combination of interpolation inequalities, the author proves the upper bound of the above - mentioned performance metric. 5. **Proof Strategy**: - Use the method of weighted sums to handle interpolation inequalities. - Transform the problem into verifying the positive semi - definiteness of certain matrices. - By constructing an appropriate dual feasible point, finally prove the required convergence rate. ### Summary The main objective of this paper is to accurately determine the worst - case convergence rate of the Gradient Descent method in smooth and strongly convex optimization problems through a new proof method. This not only deepens our understanding of the Gradient Descent method but also provides a theoretical basis for further optimizing the algorithm.