A template for gradient norm minimization

Mihai I. Florea
2024-10-30
Abstract:The gradient mapping norm is a strong and easily verifiable stopping criterion for first-order methods on composite problems. When the objective exhibits the quadratic growth property, the gradient mapping norm minimization problem can be solved by online parameter-free and adaptive first-order schemes with near-optimal worst-case rates. In this work we address problems where quadratic growth is absent, a class for which no methods with all the aforementioned properties are known to exist. We formulate a template whose instantiation recovers the existing Performance Estimation derived approaches. Our framework provides a simple human-readable interpretation along with runtime convergence rates for these algorithms. Moreover, our template can be used to construct a quasi-online parameter-free method applicable to the entire class of composite problems while retaining the optimal worst-case rates with the best known proportionality constant. The analysis also allows for adaptivity. Preliminary simulation results suggest that our scheme is highly competitive in practice with the existing approaches, either obtained via Performance Estimation or employing Accumulative Regularization.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to design an optimization method that has the optimal worst - case convergence rate, is parameter - free, adaptive and online to minimize the gradient mapping norm without the quadratic growth property. This type of problem is particularly prominent in composite optimization problems because when the objective function does not have the quadratic growth characteristic, the existing methods cannot satisfy all of the above - mentioned excellent properties simultaneously. ### Specific Problem Description 1. **Minimization of the Gradient Mapping Norm**: The gradient mapping norm is a strong and easily verifiable stopping criterion for the application of first - order methods to composite problems. When the objective function has the quadratic growth property, the problem of minimizing the gradient mapping norm can be solved by online parameter - free and adaptive first - order algorithms at a nearly optimal worst - case rate. However, when the quadratic growth is absent, no method can possess these excellent properties simultaneously. 2. **Limitations of Existing Methods**: - When the objective function has the quadratic growth property, accelerated gradient methods (such as R - AGMM) can simultaneously minimize the distance to the optimal solution, the function residual and the gradient mapping norm, and have a linear convergence rate. - When the quadratic growth is absent, although some methods (such as AMGS, ACGM, AGMM) can minimize the function residual, they cannot minimize the gradient mapping norm. 3. **Research Objectives**: The paper aims to fill this gap and propose a new template method that can minimize the gradient mapping norm without relying on the quadratic growth property and maintain excellent properties such as the optimal worst - case convergence rate, being parameter - free, adaptive and online. ### Solution The paper proposes a simple template for minimizing the gradient mapping norm given the initial function residual. By instantiating this template, the existing performance - estimation - derived methods can be recovered and a clear explanation can be provided. In addition, the template can be used to construct a quasi - online parameter - free method that is applicable to the entire class of composite problems and maintains the optimal worst - case convergence rate. Specifically, the main contributions of the paper include: - Proposing a general algorithm template that can minimize the gradient mapping norm without relying on the quadratic growth property. - Recovering the existing optimal gradient methods (such as OGM - G and FISTA - G) by instantiating the template and providing a clear explanation of these methods. - Enhancing FISTA - G and proposing a new optimized composite gradient method (OCGM - G) that has the currently highest known worst - case convergence rate in minimizing the gradient mapping norm. - Combining ACGM and OCGM - G and proposing a quasi - online parameter - free method that is fully adaptive in the ACGM part and allows the OCGM - G part to also be adaptive. ### Conclusion The paper successfully solves the problem of how to design a method for minimizing the gradient mapping norm that has the optimal worst - case convergence rate, is parameter - free, adaptive and online in the absence of the quadratic growth property. Preliminary simulation results show that the proposed scheme is highly competitive in practice and outperforms the existing methods.