Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang,Yu-Zhen Janice Chen,Stephen Pasteris,Mohammad H. Hajiesmaili,John C. S. Lui,Don Towsley
2021-01-01
Abstract:Motivated by the scenario of large-scale learning in distributed systems, this paper studies a scenario where M agents cooperate together to solve the same instance of a K-armed stochastic bandit problem. The agents have limited access to a local subset of arms and are asynchronous with different gaps between decision-making rounds. The goal is to find the global optimal arm and agents are able to pull any arm, however, they can only observe the reward when the selected arm is local. The challenge is a tradeoff for agents between pulling a local arm with observable feedback, or pulling external arms without feedback and relying on others’ observations that occur at different rates. We propose AAE-LCB, a two-stage algorithm that prioritizes pulling local arms following an active arm elimination policy, and switches to other arms only if all local arms are dominated by some external arms. We analyze the regret of AAE-LCB and show it matches the regret lower bound up to a small factor.
What problem does this paper attempt to address?