Abstract:This paper considers the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. This problem can be viewed as a repeated game between a group of online players and an adversary. At each round, each player selects a decision from the constraint set, and then the adversary assigns an arbitrary, possibly nonconvex, loss function to this player. Only the loss value at the current round, rather than the entire loss function or any other information (e.g. gradient), is privately revealed to the player. Players aim to minimize a sequence of global loss functions, which are the sum of local losses. We observe that traditional multi-point bandit algorithms are unsuitable for online optimization, where the data for the loss function are not all a priori, while the one-point bandit algorithms suffer from poor regret guarantees. To address these issues, we propose a novel one-point residual feedback distributed online algorithm. This algorithm estimates the gradient using residuals from two points, effectively reducing the regret bound while maintaining $\mathcal{O}(1)$ sampling complexity per iteration. We employ a rigorous metric, dynamic regret, to evaluate the algorithm's performance. By appropriately selecting the step size and smoothing parameters, we demonstrate that the expected dynamic regret of our algorithm is comparable to existing algorithms that use two-point feedback, provided the deviation in the objective function sequence and the path length of the minimization grows sublinearly. Finally, we validate the effectiveness of the proposed algorithm through numerical simulations.

Online non-monotone diminishing return submodular maximization in the bandit setting

Online DR-Submodular Maximization: Minimizing Regret and Constraint Violation

Online DR-Submodular Maximization with Stochastic Cumulative Constraints

Improved Projection-free Online Continuous Submodular Maximization

Linear Submodular Maximization with Bandit Feedback

Sum-max Submodular Bandits

Unified Projection-Free Algorithms for Adversarial DR-Submodular Optimization

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

Maximizing Monotone DR-submodular Continuous Functions by Derivative-free Optimization

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function

Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback

Minimax Optimal Submodular Optimization with Bandit Feedback

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Bandits with Concave Aggregated Reward

A Fast Algorithm For Maximizing A Non-Monotone Dr-Submodular Integer Lattice Function

Online Stochastic Linear Optimization under One-bit Feedback

Combinatorial Multi-Armed Bandit with General Reward Functions

Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments

Per-Round Knapsack-Constrained Linear Submodular Bandits