ODE-based Learning to Optimize

Zhonglin Xie,Wotao Yin,Zaiwen Wen
2024-06-04
Abstract:Recent years have seen a growing interest in understanding acceleration methods through the lens of ordinary differential equations (ODEs). Despite the theoretical advancements, translating the rapid convergence observed in continuous-time models to discrete-time iterative methods poses significant challenges. In this paper, we present a comprehensive framework integrating the inertial systems with Hessian-driven damping equation (ISHD) and learning-based approaches for developing optimization methods through a deep synergy of theoretical insights. We first establish the convergence condition for ensuring the convergence of the solution trajectory of ISHD. Then, we show that provided the stability condition, another relaxed requirement on the coefficients of ISHD, the sequence generated through the explicit Euler discretization of ISHD converges, which gives a large family of practical optimization methods. In order to select the best optimization method in this family for certain problems, we introduce the stopping time, the time required for an optimization method derived from ISHD to achieve a predefined level of suboptimality. Then, we formulate a novel learning to optimize (L2O) problem aimed at minimizing the stopping time subject to the convergence and stability condition. To navigate this learning problem, we present an algorithm combining stochastic optimization and the penalty method (StoPM). The convergence of StoPM using the conservative gradient is proved. Empirical validation of our framework is conducted through extensive numerical experiments across a diverse set of optimization problems. These experiments showcase the superior performance of the learned optimization methods.
Optimization and Control,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily aims to address two core issues: 1. **How to transform the fast convergence properties in continuous-time models into stable convergence of discrete-time iterative sequences through explicit discretization schemes?** (Problem P1) The paper proposes a comprehensive approach by integrating Inertial Systems with Hessian-driven Damping (ISHD) with learning-based methods to develop optimization algorithms. First, the authors establish conditions that ensure the convergence of ISHD solution trajectories and further demonstrate that, under stability conditions, the sequence obtained by explicitly discretizing ISHD using the Euler method also converges. This addresses the problem of transforming the fast convergence properties in continuous-time models into discrete-time iterative methods. 2. **How to determine the optimal coefficients of the ISHD equation for specific problems?** (Problem P2) To find the optimal coefficient selection for specific problems, the paper introduces the concept of "stopping time" as a measure of the efficiency of optimization methods and uses it as a training loss function. By formulating a novel Learning to Optimize (L2O) problem, the paper aims to minimize the stopping time while ensuring convergence and stability conditions. To solve this L2O problem, the authors propose an algorithm combining stochastic optimization and penalty methods (StoPM) and validate the effectiveness of the proposed framework through experiments. The main contributions of the paper can be summarized as follows: - Establishing conditions that ensure the convergence of the sequence {x_k} generated by explicit Euler discretization. - Introducing the measure of "stopping time" and proving its differentiability under certain conditions, making it suitable for numerical optimization. - Proposing a general L2O framework and corresponding algorithm to automatically find the optimal coefficients of ISHD to meet the needs of specific optimization problems. - Demonstrating the superior performance of the learned optimization methods through extensive numerical experiments. These contributions are not only theoretically solid but also practically valuable, providing new perspectives for designing efficient and convergence-guaranteed optimization algorithms.