Near-Optimal Convex Simple Bilevel Optimization with a Bisection Method

Jiulin Wang,Xu Shi,Rujun Jiang
2024-03-05
Abstract:This paper studies a class of simple bilevel optimization problems where we minimize a composite convex function at the upper-level subject to a composite convex lower-level problem. Existing methods either provide asymptotic guarantees for the upper-level objective or attain slow sublinear convergence rates. We propose a bisection algorithm to find a solution that is $\epsilon_f$-optimal for the upper-level objective and $\epsilon_g$-optimal for the lower-level objective. In each iteration, the binary search narrows the interval by assessing inequality system feasibility. Under mild conditions, the total operation complexity of our method is ${\tilde {\mathcal{O}}}\left(\max\{\sqrt{L_{f_1}/\epsilon_f},\sqrt{L_{g_1}/\epsilon_g} \} \right)$. Here, a unit operation can be a function evaluation, gradient evaluation, or the invocation of the proximal mapping, $L_{f_1}$ and $L_{g_1}$ are the Lipschitz constants of the upper- and lower-level objectives' smooth components, and ${\tilde {\mathcal{O}}}$ hides logarithmic terms. Our approach achieves a near-optimal rate, matching the optimal rate in unconstrained smooth or composite convex optimization when disregarding logarithmic terms. Numerical experiments demonstrate the effectiveness of our method.
Optimization and Control
What problem does this paper attempt to address?
### The problems the paper attempts to solve This paper studies a class of simple bilevel optimization problems, in which a composite convex function is minimized at the upper level while being constrained by a lower - level composite convex problem. Existing methods either provide asymptotic guarantees for the upper - level objective or achieve a slow sub - linear convergence rate. This paper proposes a bisection algorithm to find a solution that is \( \epsilon_f \)-optimal for the upper - level objective and \( \epsilon_g \)-optimal for the lower - level objective. ### Specific problem description Specifically, the paper focuses on the convex bilevel optimization problem in the following form: \[ \text{(P)} \quad \min_{x \in \mathbb{R}^n} f(x) := f_1(x) + f_2(x) \quad \text{s.t.} \quad x \in \arg \min_{z \in \mathbb{R}^n} g(z) := g_1(z) + g_2(z). \] where: - \( f_1 \) and \( g_1 \) are convex and continuously differentiable functions, and their gradients \( \nabla f_1 \) and \( \nabla g_1 \) are \( L_{f1} \)-Lipschitz continuous and \( L_{g1} \)-Lipschitz continuous respectively. - \( f_2 \) and \( g_2 \) are proper lower semi - continuous (l.s.c.) convex functions. - It is assumed that \( g \) is not strongly convex and the lower - level problem has multiple optimal solutions, that is, the optimal solution set \( X^*_g \) of the lower - level problem is not a singleton. ### Objectives The objective of the paper is to find a \( (\epsilon_f, \epsilon_g) \)-optimal solution \( \hat{x} \), satisfying: \[ f(\hat{x}) - p^* \leq \epsilon_f \quad \text{and} \quad g(\hat{x}) - g^* \leq \epsilon_g, \] where \( p^* \) is the optimal value of problem (P), and \( g^* \) is the optimal value of the unconstrained lower - level problem: \[ \min_{x \in \mathbb{R}^n} g(x) := g_1(x) + g_2(x). \] ### Methods The paper proposes an algorithm based on the bisection method, which gradually narrows the interval containing \( p^* \) by evaluating the feasibility of the inequality system. In each iteration step, the bisection search adjusts the interval according to the feasibility of the inequality system. Under mild conditions, the total operation complexity of this method is: \[ \tilde{O}\left(\max\left\{\sqrt{\frac{L_{f1}}{\epsilon_f}}, \sqrt{\frac{L_{g1}}{\epsilon_g}}\right\}\right), \] where a unit operation can be a function evaluation, a gradient evaluation or a call to the proximal mapping, \( L_{f1} \) and \( L_{g1} \) are the Lipschitz constants of the smooth parts of the upper - level and lower - level objective functions respectively, and \( \tilde{O} \) ignores the logarithmic term. ### Contributions - Under mild conditions, a new bisection method is proposed, which can find the \( (\epsilon_f, \epsilon_g) \)-optimal solution within the operation complexity of \( \tilde{O}\left(\max\left\{\sqrt{\frac{L_{f1}}{\epsilon_f}}, \sqrt{\frac{L_{g1}}{\epsilon_g}}\right\}\right) \). - By introducing the H\"olderian error bound assumption of the lower - level problem and other smoothness assumptions, the method can be in \( \tilde{O}\left(\frac{1}{\sqrt{\epsilon_f}}\right) \)