Abstract:We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $\epsilon$-first-order stationary point of the objective with $\tilde{\mathcal{O}}(\kappa^{3.25}\epsilon^{-1.75})$ oracle complexity, where $\kappa$ is the condition number of the lower-level objective and $\epsilon$ is the desired accuracy. We also propose a perturbed variant of \texttt{RAHGD} for finding an $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon}\,)\big)$-second-order stationary point within the same order of oracle complexity. Our results achieve the best-known theoretical guarantees for finding stationary points in bilevel optimization and also improve upon the existing upper complexity bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems, setting a new state-of-the-art benchmark. Empirical studies are conducted to validate the theoretical results in this paper.

What problem does this paper attempt to address?

This paper attempts to address two key challenges in non - convex - strongly convex bilevel optimization problems: 1. **Finding an approximate first - order stationary point (First - Order Stationary Point, FOSP)**: The paper proposes a new method - Restarted Accelerated HyperGradient Descent (RAHGD), which can find an $\epsilon$-first - order stationary point of the objective function within a complexity of $\tilde{O}(\kappa^{3.25}\epsilon^{-1.75})$. Here, $\kappa$ is the condition number of the lower - level objective function, and $\epsilon$ is the required precision. 2. **Finding an approximate second - order stationary point (Second - Order Stationary Point, SOSP)**: To further improve the algorithm performance, the paper also proposes a perturbed version of the RAHGD method - Perturbed Restarted Accelerated HyperGradient Descent (PRAHGD). PRAHGD can find an $(\epsilon, O(\kappa^{2.5}\sqrt{\epsilon}))$-second - order stationary point within the same complexity $\tilde{O}(\kappa^{3.25}\epsilon^{-1.75})$. This represents a significant improvement over the complexity of existing methods in finding second - order stationary points. ### Main contributions 1. **RAHGD method**: - Proposes an algorithm that combines Nesterov's Accelerated Gradient Descent (AGD) and the Conjugate Gradient (CG) method to approximately solve the lower - level problem $y^*(x)$ and construct an inexact hypergradient of the objective function. - Through appropriate restart and acceleration strategies, RAHGD can find an $\epsilon$-first - order stationary point within $\tilde{O}(\kappa^{3.25}\epsilon^{-1.75})$ first - order Oracle queries. 2. **PRAHGD method**: - Adds a perturbation step on the basis of RAHGD, enabling the algorithm to effectively escape saddle points and find an $(\epsilon, O(\kappa^{2.5}\sqrt{\epsilon}))$-second - order stationary point. - The complexity of PRAHGD is $\tilde{O}(\kappa^{3.25}\epsilon^{-1.75})$, which is superior to existing methods. 3. **Application in min - max optimization**: - Applies PRAHGD to non - convex - strongly concave min - max optimization problems and proposes the Perturbed Restarted Accelerated Gradient Descent Ascent (PRAGDA) algorithm. - PRAGDA can find an $(\epsilon, O(\kappa^{1.5}\sqrt{\epsilon}))$-second - order stationary point within $\tilde{O}(\kappa^{1.75}\epsilon^{-1.75})$ first - order Oracle queries, which is superior to existing methods. 4. **Experimental verification**: - Verifies the effectiveness of the proposed algorithms through multiple experimental tasks, including synthetic min - max problems, hyperparameter optimization on the MNIST dataset, and hyperparameter optimization in logistic regression. - The experimental results show that the proposed algorithms outperform various baseline algorithms in terms of convergence speed. ### Related work - Research on bilevel optimization problems can be traced back to the 1970s and has made significant progress in recent years in fields such as meta - learning, reinforcement learning, and hyperparameter optimization. - Existing research mainly focuses on finding first - order stationary points, while research on finding second - order stationary points is relatively scarce.

Accelerating Inexact HyperGradient Descent for Bilevel Optimization

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

A Conditional Gradient-based Method for Simple Bilevel Optimization with Convex Lower-level Problem

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the O(<SUP>-7/4</SUP>) Complexity

An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

Analyzing Inexact Hypergradients for Bilevel Learning

On Momentum-Based Gradient Methods for Bilevel Optimization with Nonconvex Lower-Level

Adaptive Mirror Descent Bilevel Optimization

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning

On Penalty-based Bilevel Gradient Descent Method

An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

Bilevel Optimization without Lower-Level Strong Convexity from the Hyper-Objective Perspective

Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity

Inexact bilevel stochastic gradient methods for constrained and unconstrained lower-level problems

First-Order Methods for Linearly Constrained Bilevel Optimization

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

Accessible Theoretical Complexity of the Restarted Primal-Dual Hybrid Gradient Method for Linear Programs with Unique Optima