Abstract:This work is concerned with distributed online bandit learning over a multi-agent network, where a group of agents aim to seek the minimizer of a time-changing global loss function cooperatively. At each epoch, the global loss function can be represented as the sum of local loss functions known privately by individual agent over the network. Furthermore, local functions are sequentially accessible to agents, and all the agents have no knowledge of future loss functions. Thus, agents of the network must interchange messages to pursue an online estimation of the global loss function. In this paper, we are interested in a bandit setup, where only values of local loss functions at sampling points are disclosed to agents. Meanwhile, we consider a more general network with unbalanced digraphs that the corresponding weight matrix is allowed to be only row stochastic. By extending the celebrated mirror descent algorithm, we first design a distributed bandit online leaning method for the online distributed convex problem. We then establish the sublinear expected dynamic regret attained by the algorithm for convex and strongly convex loss functions, respectively, when the accumulative deviation of the minimizer sequence increases sublinearly. Moreover, the expected dynamic regret bound is analysed for strongly convex loss functions. In addition, the expected static regret bound with the order of O(root T) is obtained in the bandit setting while the corresponding static regret bound with the order of O(ln T) is also provided for the strongly convex case. Finally, numerical examples are provided to illustrate the efficiency of the method and to verify the theoretical findings.

Joint Learning of Network Topology and Opinion Dynamics Based on Bandit Algorithms

Learning on Dynamic Social Network

Bayesian learning in a network with multi-hypothesis decision exchanges

Networked Bandits With Disjoint Linear Payoffs

Analysis and Application of Opinion Model with Multiple Topic Interactions.

Distributed Learning of Predictive Structures from Multiple Tasks over Networks

Data-Driven Adaptive Consensus Learning From Network Topologies

Opinion Dynamics in Multi-Agent Systems with Binary Decision Exchanges

Distributed Online Bandit Learning in Dynamic Environments over Unbalanced Digraphs.

Opinion Dynamics with Bayesian Learning.

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Data-Driven Adaptive Iterative Learning Bipartite Consensus for Heterogeneous Nonlinear Cooperation-Antagonism Networks.

Social Bandit Learning: Strangers Can Help

Opinion shaping in social networks using reinforcement learning

Non-Bayesian Social Learning with Multiview Observations

Adaptive Consensus of Multi-agents in Jointly Connected Networks.

Social Learning in Multi-True-state Networks

Learning Communities from Equilibria of Nonlinear Opinion Dynamics

Non-Bayesian Learning in Social Networks with Time-Varying Weights

Social Learning with Bayesian Agents and Random Decision Making

Opinion Dynamics On Adaptive Networks An Evolutionary Game Theoretical Approach With Incomplete Information