Abstract:Most of the existing federated multi-armed bandits (FMAB) designs are based on the presumption that clients will implement the specified design to collaborate with the server. In reality, however, it may not be possible to modify the client's existing protocols. To address this challenge, this work focuses on clients who always maximize their individual cumulative rewards, and introduces a novel idea of "reward teaching", where the server guides the clients towards global optimality through implicit local reward adjustments. Under this framework, the server faces two tightly coupled tasks of bandit learning and target teaching, whose combination is non-trivial and challenging. A phased approach, called Teaching-After-Learning (TAL), is first designed to encourage and discourage clients' explorations separately. General performance analyses of TAL are established when the clients' strategies satisfy certain mild requirements. With novel technical approaches developed to analyze the warm-start behaviors of bandit algorithms, particularized guarantees of TAL with clients running UCB or epsilon-greedy strategies are then obtained. These results demonstrate that TAL achieves logarithmic regrets while only incurring logarithmic adjustment costs, which is order-optimal w.r.t. a natural lower bound. As a further extension, the Teaching-While-Learning (TWL) algorithm is developed with the idea of successive arm elimination to break the non-adaptive phase separation in TAL. Rigorous analyses demonstrate that when facing clients with UCB1, TWL outperforms TAL in terms of the dependencies on sub-optimality gaps thanks to its adaptive design. Experimental results demonstrate the effectiveness and generality of the proposed algorithms.

Balanced and Incentivized Learning with Limited Shared Information in Multi-agent Multi-armed Bandit.

MotiLearn: Contract-Based Incentive Mechanism for Heterogeneous Edge Collaborative Training

Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

Reward Teaching for Federated Multi-armed Bandits

Socially-Optimal Mechanism Design for Incentivized Online Learning

Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits

Achieve Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits

A Federated Online Restless Bandit Framework for Cooperative Resource Allocation

Multi-Agent Best Arm Identification in Stochastic Linear Bandits

Optimal Regret Bounds for Collaborative Learning in Bandits

Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits

Federated Combinatorial Multi-Agent Multi-Armed Bandits

Auction-Based Combinatorial Multi-Armed Bandit Mechanisms with Strategic Arms

Distributed Bandits with Heterogeneous Agents

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Multiarmed Bandits Problem Under the Mean-Variance Setting

Byzantine-Resilient Decentralized Multi-Armed Bandits

Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Combinatorial Multi-Armed Bandit: General Framework and Applications.