Abstract:We study a robust, i.e. in presence of malicious participants, multi-agent multi-armed bandit problem where multiple participants are distributed on a fully decentralized blockchain, with the possibility of some being malicious. The rewards of arms are homogeneous among the honest participants, following time-invariant stochastic distributions, which are revealed to the participants only when certain conditions are met to ensure that the coordination mechanism is secure enough. The coordination mechanism's objective is to efficiently ensure the cumulative rewards gained by the honest participants are maximized. To this end, we are the first to incorporate advanced techniques from blockchains, as well as novel mechanisms, into such a cooperative decision making framework to design optimal strategies for honest participants. This framework allows various malicious behaviors and the maintenance of security and participant privacy. More specifically, we select a pool of validators who communicate to all participants, design a new consensus mechanism based on digital signatures for these validators, invent a UCB-based strategy that requires less information from participants through secure multi-party computation, and design the chain-participant interaction and an incentive mechanism to encourage participants' participation. Notably, we are the first to prove the theoretical regret of the proposed algorithm and claim its optimality. Unlike existing work that integrates blockchains with learning problems such as federated learning which mainly focuses on optimality via computational experiments, we demonstrate that the regret of honest participants is upper bounded by $\log{T}$ under certain assumptions. The regret bound is consistent with the multi-agent multi-armed bandit problem, both without malicious participants and with purely Byzantine attacks which do not affect the entire system.

Distributed Bandits with Heterogeneous Agents

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents

Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

Fair Multi-Agent Bandits

Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis

Federated Combinatorial Multi-Agent Multi-Armed Bandits

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Collaborative Multi-agent Stochastic Linear Bandits

Networked Bandits With Disjoint Linear Payoffs

Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication.

Achieve Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits

Distributed Stochastic Bandit Learning with Delayed Context Observation

Byzantine-Resilient Decentralized Multi-Armed Bandits

QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits

Multi-Agent Best Arm Identification in Stochastic Linear Bandits

Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels

Decentralized Blockchain-based Robust Multi-agent Multi-armed Bandit

Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions