Abstract:How to incentivize self-interested agents to explore when they prefer to exploit? Consider a population of self-interested agents that make decisions under uncertainty. They "explore" to acquire new information and "exploit" this information to make good decisions. Collectively they need to balance these two objectives, but their incentives are skewed toward exploitation. This is because exploration is costly, but its benefits are spread over many agents in the future. "Incentivized Exploration" addresses this issue via strategic communication. Consider a benign ``principal" which can communicate with the agents and make recommendations, but cannot force the agents to comply. Moreover, suppose the principal can observe the agents' decisions and the outcomes of these decisions. The goal is to design a communication and recommendation policy which (i) achieves a desirable balance between exploration and exploitation, and (ii) incentivizes the agents to follow recommendations. What makes it feasible is "information asymmetry": the principal knows more than any one agent, as it collects information from many. It is essential that the principal does not fully reveal all its knowledge to the agents. Incentivized exploration combines two important problems in, resp., machine learning and theoretical economics. First, if agents always follow recommendations, the principal faces a multi-armed bandit problem: essentially, design an algorithm that balances exploration and exploitation. Second, interaction with a single agent corresponds to "Bayesian persuasion", where a principal leverages information asymmetry to convince an agent to take a particular action. We provide a brief but self-contained introduction to each problem through the lens of incentivized exploration, solving a key special case of the former as a sub-problem of the latter.

Disentangling Exploration from Exploitation

The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective

Approximate information for efficient exploration-exploitation strategies

Robust experimentation in the continuous time bandit problem

Fair Exploration via Axiomatic Bargaining

Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

The Perils of Exploration under Competition: A Computational Modeling Approach

Incentivizing Exploration with Heterogeneous Value of Money

Exploration Unbound

Humans adaptively resolve the explore-exploit dilemma under cognitive constraints: Evidence from a multi-armed bandit task

Adaptive Experimentation When You Can't Experiment

Exploration by Running Away from the Past

Exploration and Persuasion

Behind the Myth of Exploration in Policy Gradients

Bayesian Incentive-Compatible Bandit Exploration

Bayesian Exploration with Heterogeneous Agents

Fair Exploration and Exploitation

Competing Bandits: The Perils of Exploration Under Competition

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

Temporally-Extended ε-Greedy Exploration

Optimal Exploration is no harder than Thompson Sampling