A distributional code for value in dopamine-based reinforcement learning

Will Dabney,Zeb Kurth-Nelson,Naoshige Uchida,Clara Kwon Starkweather,Demis Hassabis,Rémi Munos,Matthew Botvinick
DOI: https://doi.org/10.1038/s41586-019-1924-6
IF: 64.8
2020-01-15
Nature
Abstract:Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain<sup><a href="#ref-CR1">1</a>,<a href="#ref-CR2">2</a>,<a href="/articles/s41586-019-1924-6#ref-CR3">3</a></sup>. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning<sup><a href="#ref-CR4">4</a>,<a href="#ref-CR5">5</a>,<a href="/articles/s41586-019-1924-6#ref-CR6">6</a></sup>. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.
multidisciplinary sciences
What problem does this paper attempt to address?