Accelerating Inexact HyperGradient Descent for Bilevel Optimization

Haikuo Yang,Luo Luo,Chris Junchi Li,Michael I. Jordan
2023-07-01
Abstract:We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $\epsilon$-first-order stationary point of the objective with $\tilde{\mathcal{O}}(\kappa^{3.25}\epsilon^{-1.75})$ oracle complexity, where $\kappa$ is the condition number of the lower-level objective and $\epsilon$ is the desired accuracy. We also propose a perturbed variant of \texttt{RAHGD} for finding an $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon}\,)\big)$-second-order stationary point within the same order of oracle complexity. Our results achieve the best-known theoretical guarantees for finding stationary points in bilevel optimization and also improve upon the existing upper complexity bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems, setting a new state-of-the-art benchmark. Empirical studies are conducted to validate the theoretical results in this paper.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
This paper attempts to address two key challenges in non - convex - strongly convex bilevel optimization problems: 1. **Finding an approximate first - order stationary point (First - Order Stationary Point, FOSP)**: The paper proposes a new method - Restarted Accelerated HyperGradient Descent (RAHGD), which can find an $\epsilon$-first - order stationary point of the objective function within a complexity of $\tilde{O}(\kappa^{3.25}\epsilon^{-1.75})$. Here, $\kappa$ is the condition number of the lower - level objective function, and $\epsilon$ is the required precision. 2. **Finding an approximate second - order stationary point (Second - Order Stationary Point, SOSP)**: To further improve the algorithm performance, the paper also proposes a perturbed version of the RAHGD method - Perturbed Restarted Accelerated HyperGradient Descent (PRAHGD). PRAHGD can find an $(\epsilon, O(\kappa^{2.5}\sqrt{\epsilon}))$-second - order stationary point within the same complexity $\tilde{O}(\kappa^{3.25}\epsilon^{-1.75})$. This represents a significant improvement over the complexity of existing methods in finding second - order stationary points. ### Main contributions 1. **RAHGD method**: - Proposes an algorithm that combines Nesterov's Accelerated Gradient Descent (AGD) and the Conjugate Gradient (CG) method to approximately solve the lower - level problem $y^*(x)$ and construct an inexact hypergradient of the objective function. - Through appropriate restart and acceleration strategies, RAHGD can find an $\epsilon$-first - order stationary point within $\tilde{O}(\kappa^{3.25}\epsilon^{-1.75})$ first - order Oracle queries. 2. **PRAHGD method**: - Adds a perturbation step on the basis of RAHGD, enabling the algorithm to effectively escape saddle points and find an $(\epsilon, O(\kappa^{2.5}\sqrt{\epsilon}))$-second - order stationary point. - The complexity of PRAHGD is $\tilde{O}(\kappa^{3.25}\epsilon^{-1.75})$, which is superior to existing methods. 3. **Application in min - max optimization**: - Applies PRAHGD to non - convex - strongly concave min - max optimization problems and proposes the Perturbed Restarted Accelerated Gradient Descent Ascent (PRAGDA) algorithm. - PRAGDA can find an $(\epsilon, O(\kappa^{1.5}\sqrt{\epsilon}))$-second - order stationary point within $\tilde{O}(\kappa^{1.75}\epsilon^{-1.75})$ first - order Oracle queries, which is superior to existing methods. 4. **Experimental verification**: - Verifies the effectiveness of the proposed algorithms through multiple experimental tasks, including synthetic min - max problems, hyperparameter optimization on the MNIST dataset, and hyperparameter optimization in logistic regression. - The experimental results show that the proposed algorithms outperform various baseline algorithms in terms of convergence speed. ### Related work - Research on bilevel optimization problems can be traced back to the 1970s and has made significant progress in recent years in fields such as meta - learning, reinforcement learning, and hyperparameter optimization. - Existing research mainly focuses on finding first - order stationary points, while research on finding second - order stationary points is relatively scarce.