Abstract:This work is concerned with distributed online bandit learning over a multi-agent network, where a group of agents aim to seek the minimizer of a time-changing global loss function cooperatively. At each epoch, the global loss function can be represented as the sum of local loss functions known privately by individual agent over the network. Furthermore, local functions are sequentially accessible to agents, and all the agents have no knowledge of future loss functions. Thus, agents of the network must interchange messages to pursue an online estimation of the global loss function. In this paper, we are interested in a bandit setup, where only values of local loss functions at sampling points are disclosed to agents. Meanwhile, we consider a more general network with unbalanced digraphs that the corresponding weight matrix is allowed to be only row stochastic. By extending the celebrated mirror descent algorithm, we first design a distributed bandit online leaning method for the online distributed convex problem. We then establish the sublinear expected dynamic regret attained by the algorithm for convex and strongly convex loss functions, respectively, when the accumulative deviation of the minimizer sequence increases sublinearly. Moreover, the expected dynamic regret bound is analysed for strongly convex loss functions. In addition, the expected static regret bound with the order of O(root T) is obtained in the bandit setting while the corresponding static regret bound with the order of O(ln T) is also provided for the strongly convex case. Finally, numerical examples are provided to illustrate the efficiency of the method and to verify the theoretical findings.

Distributed Online Linear Regressions

Distributed Regularized Online Optimization Using Forward–backward Splitting

Distributed Censored Regression over Networks

Online Convex Optimization over Erdos-Renyi Random Networks.

Differentially Private Distributed Online Linear Regression over a Time-Varying Network

Distributed Online Bandit Linear Regressions with Differential Privacy.

Distributed Online Optimization with Long-Term Constraints

Dynamic Regret of Distributed Online Saddle Point Problem

Distributed Online Learning for Joint Regret with Communication Constraints

Distributed Ordinal Regression Over Networks

Distributed Censored Regression Over Networks.

Distributed Online Learning with Adversarial Participants in an Adversarial Environment

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

Online distributed optimization with stochastic gradients: high probability bound of regrets

Online Learning over Distributed Low-Rank Networks Via Sequential Power Iteration

Distributed Online Bandit Learning in Dynamic Environments over Unbalanced Digraphs.

Privacy-Preserving Distributed Online Optimization Over Unbalanced Digraphs via Subgradient Rescaling

Distributed Linear Equations over Random Networks

Distributed No-Regret Learning for Stochastic Aggregative Games over Networks

Distributed Regression Estimation with Incomplete Data in Multi-Agent Networks

Distributed Dynamic Online Linear Regression over Unbalanced Graphs