Distributed Multi-Armed Bandit over Arbitrary Undirected Graphs.

Jingxuan Zhu,Ji Liu
DOI: https://doi.org/10.1109/CDC45484.2021.9683253
2021-01-01
Abstract:This paper studies a distributed multi-armed bandit problem in a network of multiple agents, each of which can communicate only with its neighbors, where neighbor relationships are described by an undirected graph. Each agent makes a sequence of decisions on selecting an arm from a given set of candidates, yet it only has access to local samples of the reward for each action, which is an unknown random variable. All the agents share the same distribution of each arm's reward. A distributed upper confidence bound (UCB) algorithm is proposed for the agents to cooperatively learn the best arm, which does not require any global information. It is shown that the algorithm achieves a logarithmic regret for each of the agents, even though the graph is disconnected. The derived regret implies that the proposed distributed UCB algorithm enables a faster learning for any agent in the network compared with the classical single-agent UCB algorithm, as long as the agent has at least one neighbor.
What problem does this paper attempt to address?