Abstract:This work is concerned with distributed online bandit learning over a multi-agent network, where a group of agents aim to seek the minimizer of a time-changing global loss function cooperatively. At each epoch, the global loss function can be represented as the sum of local loss functions known privately by individual agent over the network. Furthermore, local functions are sequentially accessible to agents, and all the agents have no knowledge of future loss functions. Thus, agents of the network must interchange messages to pursue an online estimation of the global loss function. In this paper, we are interested in a bandit setup, where only values of local loss functions at sampling points are disclosed to agents. Meanwhile, we consider a more general network with unbalanced digraphs that the corresponding weight matrix is allowed to be only row stochastic. By extending the celebrated mirror descent algorithm, we first design a distributed bandit online leaning method for the online distributed convex problem. We then establish the sublinear expected dynamic regret attained by the algorithm for convex and strongly convex loss functions, respectively, when the accumulative deviation of the minimizer sequence increases sublinearly. Moreover, the expected dynamic regret bound is analysed for strongly convex loss functions. In addition, the expected static regret bound with the order of O(root T) is obtained in the bandit setting while the corresponding static regret bound with the order of O(ln T) is also provided for the strongly convex case. Finally, numerical examples are provided to illustrate the efficiency of the method and to verify the theoretical findings.

Distributed Online Bandit Tracking for Nash Equilibrium under Partial-Decision Information Setting

Decentralized Nash Equilibria Learning for Online Game with Bandit Feedback

Distributed Inertial Online Game Algorithm for Tracking Generalized Nash Equilibria.

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

Continuous-Time Online Distributed Seeking for Generalized Nash Equilibrium of Nonmonotone Online Game

Distributed Online Algorithm with Inertia for Seeking Generalized Nash Equilibria.

Distributed Alternated-Inertia Generalized Nash Equilibrium Seeking Algorithm: the Partial-Decision-information Case

Iteratively Regularized Gradient Tracking Methods for Optimal Equilibrium Seeking

Online Distributed Algorithms for Seeking Generalized Nash Equilibria in Dynamic Environments

Hybrid Nash Equilibrium Seeking under Partial-Decision Information: an Adaptive Dynamic Event-Triggered Approach

Distributed online generalized Nash Equilibrium learning in multi-cluster games: A delay-tolerant algorithm

Online Distributed Tracking of Generalized Nash Equilibrium on Physical Networks

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Distributed Prediction-Correction Algorithms for Time-Varying Nash Equilibrium Tracking

On the Linear Convergence of Distributed Nash Equilibrium Seeking for Multi-Cluster Games under Partial-Decision Information

Online Bandit Convex Optimization over a Network

Distributed No-Regret Learning for Multi-Stage Systems with End-to-End Bandit Feedback

Generalized Bandit Regret Minimizer Framework in Imperfect Information Extensive-Form Game.

Statistical Privacy-Preserving Online Distributed Nash Equilibrium Tracking in Aggregative Games

Distributed Online Bandit Learning in Dynamic Environments over Unbalanced Digraphs.

Distributed Nash equilibrium seeking strategies via bilateral bounded gradient approach