Sequential sampling without comparison to boundary through model-free reinforcement learning

Jamal Esmaily,Rani Moran,Yasser Roudi,Bahador Bahrami
2024-08-12
Abstract:Although evidence integration to the boundary model has successfully explained a wide range of behavioral and neural data in decision making under uncertainty, how animals learn and optimize the boundary remains unresolved. Here, we propose a model-free reinforcement learning algorithm for perceptual decisions under uncertainty that dispenses entirely with the concepts of decision boundary and evidence accumulation. Our model learns whether to commit to a decision given the available evidence or continue sampling information at a cost. We reproduced the canonical features of perceptual decision-making such as dependence of accuracy and reaction time on evidence strength, modulation of speed-accuracy trade-off by payoff regime, and many others. By unifying learning and decision making within the same framework, this model can account for unstable behavior during training as well as stabilized post-training behavior, opening the door to revisiting the extensive volumes of discarded training data in the decision science literature.
Neural and Evolutionary Computing
What problem does this paper attempt to address?
The paper attempts to address the problem of how animals learn and optimize decision boundaries in perceptual decision-making processes under conditions of uncertainty. Traditionally, the evidence accumulation to boundary model has successfully explained behavior and neural data when making decisions under uncertainty. However, the specific mechanisms by which animals learn these decision boundaries remain unclear. This paper proposes a model-free reinforcement learning approach, completely abandoning the concepts of decision boundaries and evidence accumulation. Instead, it learns whether to make a decision or continue sampling information given the evidence. This approach not only reproduces classic features of perceptual decision-making, such as the dependence of accuracy and reaction time on evidence strength and the impact of reward patterns on the speed-accuracy trade-off, but also explains both unstable behavior during the learning process and stable behavior after training within a unified framework. This provides the potential for further analysis of a large amount of training data that has been overlooked in the decision science literature.