Abstract:Learning to optimally predict rewards requires agents to account for fluctuations in reward value. Recent work suggests that individuals can efficiently learn about variable rewards through adaptation of the learning rate, and coding of prediction errors relative to reward variability. Such adaptive coding has been linked to midbrain dopamine neurons in nonhuman primates, and evidence in support for a similar role of the dopaminergic system in humans is emerging from fMRI data. Here, we sought to investigate the effect of dopaminergic perturbations on adaptive prediction error coding in humans, using a between-subject, placebo-controlled pharmacological fMRI study with a dopaminergic agonist (bromocriptine) and antagonist (sulpiride). Participants performed a previously validated task in which they predicted the magnitude of upcoming rewards drawn from distributions with varying SDs. After each prediction, participants received a reward, yielding trial-by-trial prediction errors. Under placebo, we replicated previous observations of adaptive coding in the midbrain and ventral striatum. Treatment with sulpiride attenuated adaptive coding in both midbrain and ventral striatum, and was associated with a decrease in performance, whereas bromocriptine did not have a significant impact. Although we observed no differential effect of SD on performance between the groups, computational modeling suggested decreased behavioral adaptation in the sulpiride group. These results suggest that normal dopaminergic function is critical for adaptive prediction error coding, a key property of the brain thought to facilitate efficient learning in variable environments. Crucially, these results also offer potential insights for understanding the impact of disrupted dopamine function in mental illness.SIGNIFICANCE STATEMENT To choose optimally, we have to learn what to expect. Humans dampen learning when there is a great deal of variability in reward outcome, and two brain regions that are modulated by the brain chemical dopamine are sensitive to reward variability. Here, we aimed to directly relate dopamine to learning about variable rewards, and the neural encoding of associated teaching signals. We perturbed dopamine in healthy individuals using dopaminergic medication and asked them to predict variable rewards while we made brain scans. Dopamine perturbations impaired learning and the neural encoding of reward variability, thus establishing a direct link between dopamine and adaptation to reward variability. These results aid our understanding of clinical conditions associated with dopaminergic dysfunction, such as psychosis.

Dopamine neurons report an error in the temporal prediction of reward during learning

Dopamine reward prediction error coding

Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner

The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning

Dopamine reward prediction error signal codes the temporal evaluation of a perceptual decision report

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

Dopamine, Prediction Error and Beyond

Dopamine transients do not act as model-free prediction errors during associative learning

Dopamine neuron ensembles signal the content of sensory prediction errors

Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework

Dopamine, Inference, and Uncertainty

Dopamine, Updated: Reward Prediction Error and Beyond

Representation learning with reward prediction errors

Dopamine transients encode reward prediction errors independent of learning rates

Mesolimbic dopamine encodes reward prediction errors independent of learning rates

A causal link between prediction errors, dopamine neurons and learning

Dopamine Modulates Adaptive Prediction Error Coding in the Human Midbrain and Striatum

Dopamine transients delivered in learning contexts do not act as model-free prediction errors

Dopamine neurons learn to encode the long-term value of multiple future rewards