Abstract:This paper studies a class of simple bilevel optimization problems where we minimize a composite convex function at the upper-level subject to a composite convex lower-level problem. Existing methods either provide asymptotic guarantees for the upper-level objective or attain slow sublinear convergence rates. We propose a bisection algorithm to find a solution that is $\epsilon_f$-optimal for the upper-level objective and $\epsilon_g$-optimal for the lower-level objective. In each iteration, the binary search narrows the interval by assessing inequality system feasibility. Under mild conditions, the total operation complexity of our method is ${\tilde {\mathcal{O}}}\left(\max\{\sqrt{L_{f_1}/\epsilon_f},\sqrt{L_{g_1}/\epsilon_g} \} \right)$. Here, a unit operation can be a function evaluation, gradient evaluation, or the invocation of the proximal mapping, $L_{f_1}$ and $L_{g_1}$ are the Lipschitz constants of the upper- and lower-level objectives' smooth components, and ${\tilde {\mathcal{O}}}$ hides logarithmic terms. Our approach achieves a near-optimal rate, matching the optimal rate in unconstrained smooth or composite convex optimization when disregarding logarithmic terms. Numerical experiments demonstrate the effectiveness of our method.

What problem does this paper attempt to address?

### The problems the paper attempts to solve This paper studies a class of simple bilevel optimization problems, in which a composite convex function is minimized at the upper level while being constrained by a lower - level composite convex problem. Existing methods either provide asymptotic guarantees for the upper - level objective or achieve a slow sub - linear convergence rate. This paper proposes a bisection algorithm to find a solution that is $ \epsilon_f $-optimal for the upper - level objective and $ \epsilon_g $-optimal for the lower - level objective. ### Specific problem description Specifically, the paper focuses on the convex bilevel optimization problem in the following form: \[ \text{(P)} \quad \min_{x \in \mathbb{R}^n} f(x) := f_1(x) + f_2(x) \quad \text{s.t.} \quad x \in \arg \min_{z \in \mathbb{R}^n} g(z) := g_1(z) + g_2(z). \] where: - $ f_1 $ and $ g_1 $ are convex and continuously differentiable functions, and their gradients $ \nabla f_1 $ and $ \nabla g_1 $ are $ L_{f1} $-Lipschitz continuous and $ L_{g1} $-Lipschitz continuous respectively. - $ f_2 $ and $ g_2 $ are proper lower semi - continuous (l.s.c.) convex functions. - It is assumed that $ g $ is not strongly convex and the lower - level problem has multiple optimal solutions, that is, the optimal solution set $ X^*_g $ of the lower - level problem is not a singleton. ### Objectives The objective of the paper is to find a $ (\epsilon_f, \epsilon_g) $-optimal solution $ \hat{x} $, satisfying: \[ f(\hat{x}) - p^* \leq \epsilon_f \quad \text{and} \quad g(\hat{x}) - g^* \leq \epsilon_g, \] where $ p^* $ is the optimal value of problem (P), and $ g^* $ is the optimal value of the unconstrained lower - level problem: \[ \min_{x \in \mathbb{R}^n} g(x) := g_1(x) + g_2(x). \] ### Methods The paper proposes an algorithm based on the bisection method, which gradually narrows the interval containing $ p^* $ by evaluating the feasibility of the inequality system. In each iteration step, the bisection search adjusts the interval according to the feasibility of the inequality system. Under mild conditions, the total operation complexity of this method is: \[ \tilde{O}\left(\max\left\{\sqrt{\frac{L_{f1}}{\epsilon_f}}, \sqrt{\frac{L_{g1}}{\epsilon_g}}\right\}\right), \] where a unit operation can be a function evaluation, a gradient evaluation or a call to the proximal mapping, $ L_{f1} $ and $ L_{g1} $ are the Lipschitz constants of the smooth parts of the upper - level and lower - level objective functions respectively, and $ \tilde{O} $ ignores the logarithmic term. ### Contributions - Under mild conditions, a new bisection method is proposed, which can find the $ (\epsilon_f, \epsilon_g) $-optimal solution within the operation complexity of $ \tilde{O}\left(\max\left\{\sqrt{\frac{L_{f1}}{\epsilon_f}}, \sqrt{\frac{L_{g1}}{\epsilon_g}}\right\}\right) $. - By introducing the H\"olderian error bound assumption of the lower - level problem and other smoothness assumptions, the method can be in $ \tilde{O}\left(\frac{1}{\sqrt{\epsilon_f}}\right) $

Near-Optimal Convex Simple Bilevel Optimization with a Bisection Method

A Near-Optimal Algorithm for Convex Simple Bilevel Optimization under Weak Assumptions

A Conditional Gradient-based Method for Simple Bilevel Optimization with Convex Lower-level Problem

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

Penalty-based Methods for Simple Bilevel Optimization under Hölderian Error Bounds

Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems

An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

Robust Bilevel Optimization for Near-Optimal Lower-Level Solutions

An Alternating Linearization Method with Inexact Data for Bilevel Nonsmooth Convex Optimization

A Projection-Free Method for Solving Convex Bilevel Optimization Problems

Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem

Bilevel Optimization without Lower-Level Strong Convexity from the Hyper-Objective Perspective

On the convergence of proximal gradient methods for convex simple bilevel optimization

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Barrier Function for Bilevel Optimization with Coupled Lower-Level Constraints: Formulation, Approximation and Algorithms

Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

A Loopless Distributed Algorithm for Personalized Bilevel Optimization

An Enhanced Branch-and-bound Algorithm for Bilevel Integer Linear Programming.

A Bilevel NLP Sensitivity‐based Decomposition for Dynamic Optimization with Moving Finite Elements

Semismooth Newton-type method for bilevel optimization: Global convergence and extensive numerical experiments