Online Nonconvex Bilevel Optimization with Bregman Divergences

Jason Bohne,David Rosenberg,Gary Kazantsev,Pawel Polak
2024-09-17
Abstract:Bilevel optimization methods are increasingly relevant within machine learning, especially for tasks such as hyperparameter optimization and meta-learning. Compared to the offline setting, online bilevel optimization (OBO) offers a more dynamic framework by accommodating time-varying functions and sequentially arriving data. This study addresses the online nonconvex-strongly convex bilevel optimization problem. In deterministic settings, we introduce a novel online Bregman bilevel optimizer (OBBO) that utilizes adaptive Bregman divergences. We demonstrate that OBBO enhances the known sublinear rates for bilevel local regret through a novel hypergradient error decomposition that adapts to the underlying geometry of the problem. In stochastic contexts, we introduce the first stochastic online bilevel optimizer (SOBBO), which employs a window averaging method for updating outer-level variables using a weighted average of recent stochastic approximations of hypergradients. This approach not only achieves sublinear rates of bilevel local regret but also serves as an effective variance reduction strategy, obviating the need for additional stochastic gradient samples at each timestep. Experiments on online hyperparameter optimization and online meta-learning highlight the superior performance, efficiency, and adaptability of our Bregman-based algorithms compared to established online and offline bilevel benchmarks.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of proposing new algorithms to handle non-convex strongly-convex bilevel optimization problems in Online Bilevel Optimization (OBO). Specifically: 1. **Online Non-Convex Strongly-Convex Bilevel Optimization**: The paper focuses on how to effectively perform bilevel optimization in the context of time-varying functions and sequentially arriving data. Compared to traditional offline bilevel optimization, OBO provides a more dynamic framework. 2. **New Algorithm OBBO**: In the deterministic setting, the paper introduces a new Online Bregman Bilevel Optimizer (OBBO), which utilizes adaptive Bregman divergence to enhance the sublinear convergence rate of the known bilevel local regret. Through a novel hypergradient error decomposition method, OBBO can better adapt to the geometric characteristics of the problem. 3. **SOBBO in Stochastic Setting**: In the stochastic environment, the paper proposes the first stochastic online bilevel optimizer (SOBBO), which updates the outer variable using a window averaging method and approximates the recent stochastic hypergradients with weighted averages. This method not only achieves a sublinear convergence rate for bilevel local regret but also effectively reduces variance without requiring additional stochastic gradient samples at each time step. The paper experimentally validates the superior performance, efficiency, and adaptability of the proposed Bregman-based algorithms in online hyperparameter optimization and online meta-learning tasks, outperforming existing online and offline bilevel benchmark methods.