Disentangling Exploration from Exploitation

Alessandro Lizzeri,Eran Shmaya,Leeat Yariv
2024-04-30
Abstract:Starting from Robbins (1952), the literature on experimentation via multi-armed bandits has wed exploration and exploitation. Nonetheless, in many applications, agents' exploration and exploitation need not be intertwined: a policymaker may assess new policies different than the status quo; an investor may evaluate projects outside her portfolio. We characterize the optimal experimentation policy when exploration and exploitation are disentangled in the case of Poisson bandits, allowing for general news structures. The optimal policy features complete learning asymptotically, exhibits lots of persistence, but cannot be identified by an index a la Gittins. Disentanglement is particularly valuable for intermediate parameter values.
Theoretical Economics,Computer Science and Game Theory
What problem does this paper attempt to address?
The paper discusses the problem of separating exploration and exploitation in multi-armed bandit experiments. Traditional models combine the two, but in practice, decision-makers may independently evaluate different choices. The paper studies the case of separating exploration and exploitation using the Poisson bandit model, analyzes optimal experimental strategies, and points out that this separation is particularly beneficial under certain parameter values. The strategies include complete learning and substantial persistence, but cannot be identified by a similar method as the Gittins index. The paper also discusses scenarios of complete separation and partial separation, demonstrating the balance between exploration and exploitation under different information structures.