Distributed Multi-Armed Bandits: Regret Vs. Communication.

Shuang Liu,Cheng Chen,Zhihua Zhang
2015-01-01
Abstract:In this paper, we generalize both the model and the results of the classical multi-armed bandit problem to a distributed setting, where a common arm set is shared by multiple players in a non-conflicting way. Moreover, the players receive the rewards independently and are allowed to communicate with each other after some prescribed rounds which are given as the elements of a \textsl{communication set}. In particular, we study how communication can help to reduce the \textsl{regret}. We propose a novel concept to measure the frequency of communication --- the \textsl{density} of the communication set, which is used to establish a non-trivial lower bound for the expected regret of \textsl{any} \textsl{consistent} policy. Furthermore, we develop a distributed policy \textsc{Dklucb} that can achieve the lower bound in the case of Bernoulli rewards. Compared to existing policies such as KL-UCB for classical multi-armed bandit problems, a crucial ingredient of \textsc{Dklucb} is to use a fake pull count. The analysis of the algorithm also becomes much more complex, requiring new tools and techniques. Finally, we discuss a possible extension of \textsc{Dklucb} for more general distributions.
What problem does this paper attempt to address?