Abstract:Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications, including hyperparameter optimization, learning data-adaptive regularizers, and optimizing forward operators. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods. Existing adaptive methods predominantly rely on deterministic formulations, while stochastic approaches often adopt a doubly-stochastic framework with impractical variance assumptions, enforces a fixed number of lower-level iterations, and requires extensive tuning. In this work, we focus on bilevel learning with strongly convex lower-level problems and a nonconvex sum-of-functions in the upper-level. Stochasticity arises from data sampling in the upper-level which leads to inexact stochastic hypergradients. We establish their connection to state-of-the-art stochastic optimization theory for nonconvex objectives. Furthermore, we prove the convergence of inexact stochastic bilevel optimization under mild assumptions. Our empirical results highlight significant speed-ups and improved generalization in imaging tasks such as image denoising and deblurring in comparison with adaptive deterministic bilevel methods.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the bilevel learning problem in large - scale machine learning, inverse problems, and imaging applications. Specifically, the author focuses on how to perform efficient bilevel optimization in the presence of inexact stochastic gradients. ### Main Problems 1. **Limitations of Existing Methods**: - **Deterministic Methods**: Existing adaptive methods mainly rely on deterministic formulations, and these methods are less efficient when dealing with large - scale data. - **Stochastic Methods**: Existing stochastic methods usually adopt a double - stochastic framework, assume unreasonable variances, fix the number of lower - level iterations, and require a large amount of parameter tuning. 2. **Challenges in Bilevel Optimization**: - In large - scale problems, calculating the exact lower - level solution is impractical, so approximate solutions need to be used. - Randomness stems from data sampling in the upper - level problem, which leads to inexact stochastic hypergradients. 3. **Convergence and Generalization Ability**: - Existing methods have insufficient analysis of the behavior and convergence of the upper - level solver when dealing with inexact lower - level solutions. - How to ensure that the algorithm can still converge and have good generalization performance in the presence of inexact gradients. ### Solutions The author proposes a new bilevel optimization framework that can perform efficient optimization in the presence of inexact stochastic gradients. Specific contributions include: - **Theoretical Analysis**: Establish the connection between inexact stochastic hypergradients and the stochastic optimization theory of non - convex objective functions, and prove the convergence under mild assumptions. - **Algorithm Design**: Propose the Inexact Stochastic Gradient Descent (ISGD) algorithm, and demonstrate its superior performance in image denoising and deblurring tasks through numerical experiments. - **Generalization Ability**: Experiments prove that this method has better generalization ability when dealing with larger - scale datasets. ### Numerical Experiments The author verifies the effectiveness of the proposed method through a series of numerical experiments. In particular, in image denoising and deblurring tasks, ISGD shows a faster convergence speed and better generalization performance compared to existing deterministic bilevel learning algorithms. In summary, this paper aims to solve the computational efficiency and generalization ability problems of bilevel learning in large - scale problems, proposes an efficient optimization framework based on inexact stochastic gradients, and verifies its effectiveness and superiority through theoretical analysis and experiments.

Bilevel Learning with Inexact Stochastic Gradients

Inexact bilevel stochastic gradient methods for constrained and unconstrained lower-level problems

An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning

An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation

Derivative-free stochastic bilevel optimization for inverse problems

Analyzing Inexact Hypergradients for Bilevel Learning

Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis

General single-loop methods for bilevel parameter learning

LancBiO: dynamic Lanczos-aided bilevel optimization via Krylov subspace

An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

An Inexact Conditional Gradient Method for Constrained Bilevel Optimization

Efficient gradient-based methods for bilevel learning via recycling Krylov subspaces

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Bilevel Optimization for Machine Learning: Algorithm Design and Convergence Analysis

On Momentum-Based Gradient Methods for Bilevel Optimization with Nonconvex Lower-Level

Bilevel Optimization without Lower-Level Strong Convexity from the Hyper-Objective Perspective

Bilevel learning of regularization models and their discretization for image deblurring and super-resolution

Adaptive Mirror Descent Bilevel Optimization

A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

Online Nonconvex Bilevel Optimization with Bregman Divergences

Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem