Learning and choosing in an uncertain world: An investigation of the explore-exploit dilemma in static and dynamic environments

Daniel J Navarro,Ben R Newell,Christin Schulze
DOI: https://doi.org/10.1016/j.cogpsych.2016.01.001
Abstract:How do people solve the explore-exploit trade-off in a changing environment? In this paper we present experimental evidence from an "observe or bet" task, in which people have to determine when to engage in information-seeking behavior and when to switch to reward-taking actions. In particular we focus on the comparison between people's behavior in a changing environment and their behavior in an unchanging one. Our experimental work is motivated by rational analysis of the problem that makes strong predictions about information search and reward seeking in static and changeable environments. Our results show a striking agreement between human behavior and the optimal policy, but also highlight a number of systematic differences. In particular, we find that while people often employ suboptimal strategies the first time they encounter the learning problem, most people are able to approximate the correct strategy after minimal experience. In order to describe both the manner in which people's choices are similar to but slightly different from an optimal standard, we introduce four process models for the observe or bet task and evaluate them as potential theories of human behavior.
What problem does this paper attempt to address?