Bilevel Learning with Inexact Stochastic Gradients

Mohammad Sadegh Salehi,Subhadip Mukherjee,Lindon Roberts,Matthias J. Ehrhardt
2024-12-17
Abstract:Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications, including hyperparameter optimization, learning data-adaptive regularizers, and optimizing forward operators. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods. Existing adaptive methods predominantly rely on deterministic formulations, while stochastic approaches often adopt a doubly-stochastic framework with impractical variance assumptions, enforces a fixed number of lower-level iterations, and requires extensive tuning. In this work, we focus on bilevel learning with strongly convex lower-level problems and a nonconvex sum-of-functions in the upper-level. Stochasticity arises from data sampling in the upper-level which leads to inexact stochastic hypergradients. We establish their connection to state-of-the-art stochastic optimization theory for nonconvex objectives. Furthermore, we prove the convergence of inexact stochastic bilevel optimization under mild assumptions. Our empirical results highlight significant speed-ups and improved generalization in imaging tasks such as image denoising and deblurring in comparison with adaptive deterministic bilevel methods.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the bilevel learning problem in large - scale machine learning, inverse problems, and imaging applications. Specifically, the author focuses on how to perform efficient bilevel optimization in the presence of inexact stochastic gradients. ### Main Problems 1. **Limitations of Existing Methods**: - **Deterministic Methods**: Existing adaptive methods mainly rely on deterministic formulations, and these methods are less efficient when dealing with large - scale data. - **Stochastic Methods**: Existing stochastic methods usually adopt a double - stochastic framework, assume unreasonable variances, fix the number of lower - level iterations, and require a large amount of parameter tuning. 2. **Challenges in Bilevel Optimization**: - In large - scale problems, calculating the exact lower - level solution is impractical, so approximate solutions need to be used. - Randomness stems from data sampling in the upper - level problem, which leads to inexact stochastic hypergradients. 3. **Convergence and Generalization Ability**: - Existing methods have insufficient analysis of the behavior and convergence of the upper - level solver when dealing with inexact lower - level solutions. - How to ensure that the algorithm can still converge and have good generalization performance in the presence of inexact gradients. ### Solutions The author proposes a new bilevel optimization framework that can perform efficient optimization in the presence of inexact stochastic gradients. Specific contributions include: - **Theoretical Analysis**: Establish the connection between inexact stochastic hypergradients and the stochastic optimization theory of non - convex objective functions, and prove the convergence under mild assumptions. - **Algorithm Design**: Propose the Inexact Stochastic Gradient Descent (ISGD) algorithm, and demonstrate its superior performance in image denoising and deblurring tasks through numerical experiments. - **Generalization Ability**: Experiments prove that this method has better generalization ability when dealing with larger - scale datasets. ### Numerical Experiments The author verifies the effectiveness of the proposed method through a series of numerical experiments. In particular, in image denoising and deblurring tasks, ISGD shows a faster convergence speed and better generalization performance compared to existing deterministic bilevel learning algorithms. In summary, this paper aims to solve the computational efficiency and generalization ability problems of bilevel learning in large - scale problems, proposes an efficient optimization framework based on inexact stochastic gradients, and verifies its effectiveness and superiority through theoretical analysis and experiments.