Abstract:This paper considers the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. This problem can be viewed as a repeated game between a group of online players and an adversary. At each round, each player selects a decision from the constraint set, and then the adversary assigns an arbitrary, possibly nonconvex, loss function to this player. Only the loss value at the current round, rather than the entire loss function or any other information (e.g. gradient), is privately revealed to the player. Players aim to minimize a sequence of global loss functions, which are the sum of local losses. We observe that traditional multi-point bandit algorithms are unsuitable for online optimization, where the data for the loss function are not all a priori, while the one-point bandit algorithms suffer from poor regret guarantees. To address these issues, we propose a novel one-point residual feedback distributed online algorithm. This algorithm estimates the gradient using residuals from two points, effectively reducing the regret bound while maintaining $\mathcal{O}(1)$ sampling complexity per iteration. We employ a rigorous metric, dynamic regret, to evaluate the algorithm's performance. By appropriately selecting the step size and smoothing parameters, we demonstrate that the expected dynamic regret of our algorithm is comparable to existing algorithms that use two-point feedback, provided the deviation in the objective function sequence and the path length of the minimization grows sublinearly. Finally, we validate the effectiveness of the proposed algorithm through numerical simulations.

Risk-Averse Stochastic Convex Bandit

Risk-Averse No-Regret Learning in Online Convex Games

A Second-Order Method for Stochastic Bandit Convex Optimisation

Distributed Online Stochastic-Constrained Convex Optimization With Bandit Feedback

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

Online Newton Method for Bandit Convex Optimisation

Second Order Methods for Bandit Optimization and Control

Federated Online and Bandit Convex Optimization

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

The Online Saddle Point Problem and Online Convex Optimization with Knapsacks

Projection-Free Bandit Convex Optimization over Strongly Convex Sets

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

Non-stationary Bandits with Habituation and Recovery Dynamics and Knapsack Constraints

A Survey of Risk-Aware Multi-Armed Bandits

Convex Methods for Constrained Linear Bandits

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization

Online Stochastic Linear Optimization under One-bit Feedback

Comparator-adaptive Convex Bandits

Push-sum Distributed Dual Averaging Online Convex Optimization With Bandit Feedback