Abstract:We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $t \to \infty$) it pulls only the arm with the highest average reward. While this goal is provably impossible for an isolated individual, we show that, in social groups, this goal can be achieved easily with the aid of social persuasion, i.e., communication. Specifically, we study the learning dynamics wherein an individual sequentially decides on which arm to pull next based on not only its private reward feedback but also the suggestions provided by randomly chosen peers. Our learning dynamics are hard to analyze via explicit probabilistic calculations due to the stochastic dependency induced by social interaction. Instead, we employ the mean-field approximation method from statistical physics and we show: (1) With probability $\to 1$ as the social group size $N \to \infty $, every individual in the social group learns the best option. (2) Over an arbitrary finite time horizon $[0, T]$, with high probability (in $N$), the fraction of individuals that prefer the best option grows to 1 exponentially fast as $t$ increases ($t\in [0, T]$). A major innovation of our mean-filed analysis is a simple yet powerful technique to deal with absorbing states in the interchange of limits $N \to \infty$ and $t \to \infty $. The mean-field approximation method allows us to approximate the probabilistic sample paths of our learning dynamics by a deterministic and smooth trajectory that corresponds to the unique solution of a well-behaved system of ordinary differential equations (ODEs). Such an approximation is desired because the analysis of a system of ODEs is relatively easier than that of the original stochastic system.

Collaborative Top Distribution Identifications with Limited Interaction (extended Abstract).

Communication-Efficient Collaborative Best Arm Identification

Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-armed Bandits

Parallel Best Arm Identification in Heterogeneous Environments

On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs

Collaboratively Learning the Best Option, Using Bounded Memory

Distributed Learning of Predictive Structures from Multiple Tasks over Networks

Practical Algorithms for Best-K Identification in Multi-Armed Bandits.

Multi-Agent Best Arm Identification in Stochastic Linear Bandits

Collaborative Deep Learning in Fixed Topology Networks

Distribution-Dependent Rates for Multi-Distribution Learning

Information-Directed Selection for Top-Two Algorithms

Distributed Bandits with Heterogeneous Agents

Cost Aware Best Arm Identification

On-Demand Sampling: Learning Optimally from Multiple Distributions

Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity

Optimal Top-Two Method for Best Arm Identification and Fluid Analysis

On Collaboration in Distributed Parameter Estimation with Resource Constraints

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Collaboratively Learning Linear Models with Structured Missing Data

Adaptive Multiple-Arm Identification