Abstract:This paper considers the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. This problem can be viewed as a repeated game between a group of online players and an adversary. At each round, each player selects a decision from the constraint set, and then the adversary assigns an arbitrary, possibly nonconvex, loss function to this player. Only the loss value at the current round, rather than the entire loss function or any other information (e.g. gradient), is privately revealed to the player. Players aim to minimize a sequence of global loss functions, which are the sum of local losses. We observe that traditional multi-point bandit algorithms are unsuitable for online optimization, where the data for the loss function are not all a priori, while the one-point bandit algorithms suffer from poor regret guarantees. To address these issues, we propose a novel one-point residual feedback distributed online algorithm. This algorithm estimates the gradient using residuals from two points, effectively reducing the regret bound while maintaining $\mathcal{O}(1)$ sampling complexity per iteration. We employ a rigorous metric, dynamic regret, to evaluate the algorithm's performance. By appropriately selecting the step size and smoothing parameters, we demonstrate that the expected dynamic regret of our algorithm is comparable to existing algorithms that use two-point feedback, provided the deviation in the objective function sequence and the path length of the minimization grows sublinearly. Finally, we validate the effectiveness of the proposed algorithm through numerical simulations.

Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication.

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs

Regret Vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

Distributed Multi-Armed Bandits: Regret Vs. Communication.

Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits

Optimal Regret Bounds for Collaborative Learning in Bandits

Distributed Online Learning for Joint Regret with Communication Constraints

Distributed Bandits with Heterogeneous Agents

Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

Achieve Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits

Distributed Stochastic Bandit Learning with Delayed Context Observation

Distributed No-Regret Learning for Multi-Stage Systems with End-to-End Bandit Feedback

On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

Individual Regret in Cooperative Stochastic Multi-Armed Bandits

Collaborative Multi-agent Stochastic Linear Bandits

Communication-Efficient Regret-Optimal Distributed Online Convex Optimization

Federated Online and Bandit Convex Optimization

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

Constant or logarithmic regret in asynchronous multiplayer bandits

Settling the Communication Complexity for Distributed Offline Reinforcement Learning

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs