Online Nonconvex Bilevel Optimization with Bregman Divergences

Jason Bohne,David Rosenberg,Gary Kazantsev,Pawel Polak

2024-09-17

Abstract:Bilevel optimization methods are increasingly relevant within machine learning, especially for tasks such as hyperparameter optimization and meta-learning. Compared to the offline setting, online bilevel optimization (OBO) offers a more dynamic framework by accommodating time-varying functions and sequentially arriving data. This study addresses the online nonconvex-strongly convex bilevel optimization problem. In deterministic settings, we introduce a novel online Bregman bilevel optimizer (OBBO) that utilizes adaptive Bregman divergences. We demonstrate that OBBO enhances the known sublinear rates for bilevel local regret through a novel hypergradient error decomposition that adapts to the underlying geometry of the problem. In stochastic contexts, we introduce the first stochastic online bilevel optimizer (SOBBO), which employs a window averaging method for updating outer-level variables using a weighted average of recent stochastic approximations of hypergradients. This approach not only achieves sublinear rates of bilevel local regret but also serves as an effective variance reduction strategy, obviating the need for additional stochastic gradient samples at each timestep. Experiments on online hyperparameter optimization and online meta-learning highlight the superior performance, efficiency, and adaptability of our Bregman-based algorithms compared to established online and offline bilevel benchmarks.

Optimization and Control,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the problem of proposing new algorithms to handle non-convex strongly-convex bilevel optimization problems in Online Bilevel Optimization (OBO). Specifically: 1. **Online Non-Convex Strongly-Convex Bilevel Optimization**: The paper focuses on how to effectively perform bilevel optimization in the context of time-varying functions and sequentially arriving data. Compared to traditional offline bilevel optimization, OBO provides a more dynamic framework. 2. **New Algorithm OBBO**: In the deterministic setting, the paper introduces a new Online Bregman Bilevel Optimizer (OBBO), which utilizes adaptive Bregman divergence to enhance the sublinear convergence rate of the known bilevel local regret. Through a novel hypergradient error decomposition method, OBBO can better adapt to the geometric characteristics of the problem. 3. **SOBBO in Stochastic Setting**: In the stochastic environment, the paper proposes the first stochastic online bilevel optimizer (SOBBO), which updates the outer variable using a window averaging method and approximates the recent stochastic hypergradients with weighted averages. This method not only achieves a sublinear convergence rate for bilevel local regret but also effectively reduces variance without requiring additional stochastic gradient samples at each time step. The paper experimentally validates the superior performance, efficiency, and adaptability of the proposed Bregman-based algorithms in online hyperparameter optimization and online meta-learning tasks, outperforming existing online and offline bilevel benchmark methods.

Online Nonconvex Bilevel Optimization with Bregman Divergences

Non-Convex Bilevel Optimization with Time-Varying Objective Functions

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis

On Momentum-Based Gradient Methods for Bilevel Optimization with Nonconvex Lower-Level

Convergence of Bayesian Bilevel Optimization.

An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

Offline Stochastic Optimization of Black-Box Objective Functions

Online Dynamic Submodular Optimization

Double Momentum Method for Lower-Level Constrained Bilevel Optimization

Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Distributed Stochastic Bilevel Optimization: Improved Complexity and Heterogeneity Analysis

LancBiO: dynamic Lanczos-aided bilevel optimization via Krylov subspace

Online Alternating Direction Method (longer version)

Robust Bayesian Optimization via Localized Online Conformal Prediction

Universal Online Convex Optimization Meets Second-order Bounds

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

A Primal-Dual Approach to Bilevel Optimization with Multiple Inner Minima

Decentralized bilevel optimization

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach.