Abstract:Learning to optimally predict rewards requires agents to account for fluctuations in reward value. Recent work suggests that individuals can efficiently learn about variable rewards through adaptation of the learning rate, and coding of prediction errors relative to reward variability. Such adaptive coding has been linked to midbrain dopamine neurons in nonhuman primates, and evidence in support for a similar role of the dopaminergic system in humans is emerging from fMRI data. Here, we sought to investigate the effect of dopaminergic perturbations on adaptive prediction error coding in humans, using a between-subject, placebo-controlled pharmacological fMRI study with a dopaminergic agonist (bromocriptine) and antagonist (sulpiride). Participants performed a previously validated task in which they predicted the magnitude of upcoming rewards drawn from distributions with varying SDs. After each prediction, participants received a reward, yielding trial-by-trial prediction errors. Under placebo, we replicated previous observations of adaptive coding in the midbrain and ventral striatum. Treatment with sulpiride attenuated adaptive coding in both midbrain and ventral striatum, and was associated with a decrease in performance, whereas bromocriptine did not have a significant impact. Although we observed no differential effect of SD on performance between the groups, computational modeling suggested decreased behavioral adaptation in the sulpiride group. These results suggest that normal dopaminergic function is critical for adaptive prediction error coding, a key property of the brain thought to facilitate efficient learning in variable environments. Crucially, these results also offer potential insights for understanding the impact of disrupted dopamine function in mental illness.SIGNIFICANCE STATEMENT To choose optimally, we have to learn what to expect. Humans dampen learning when there is a great deal of variability in reward outcome, and two brain regions that are modulated by the brain chemical dopamine are sensitive to reward variability. Here, we aimed to directly relate dopamine to learning about variable rewards, and the neural encoding of associated teaching signals. We perturbed dopamine in healthy individuals using dopaminergic medication and asked them to predict variable rewards while we made brain scans. Dopamine perturbations impaired learning and the neural encoding of reward variability, thus establishing a direct link between dopamine and adaptation to reward variability. These results aid our understanding of clinical conditions associated with dopaminergic dysfunction, such as psychosis.

Mesolimbic dopamine adapts the rate of learning from action

Dopamine reveals adaptive learning of actions representation

Value-Driven Adaptations of Mesolimbic Dopamine Release Are Governed by Both Model-Based and Model-Free Mechanisms

Mesolimbic dopamine encodes reward prediction errors independent of learning rates

Striatal dopamine reflects individual long-term learning trajectories

Reinforcement Learning Links Spontaneous Cortical Dopamine Impulses to Reward

Reward Bases: A simple mechanism for adaptive acquisition of multiple reward types

Action-modulated midbrain dopamine activity arises from distributed control policies

Does phasic dopamine release cause policy updates?

Dopamine reports reward prediction errors, but does not update policy, during inference-guided choice

Dopamine neurons learn to encode the long-term value of multiple future rewards

Action prediction error: a value-free dopaminergic teaching signal that drives stable learning

Dopamine neurons drive spatiotemporally heterogeneous striatal dopamine signals during learning

[Multiple Dopamine Signals and Their Contributions to Reinforcement Learning]

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning.

Dopamine Increases Accuracy and Lengthens Deliberation Time in Explicit Motor Skill Learning

Adaptive circuits for action and value information in rodent operant learning

Role of dopamine in reward expectation and predictability during execution of action sequences

How cortico-basal ganglia-thalamic subnetworks can shift decision policies to maximize reward rate

Signals in Human Striatum Are Appropriate for Policy Update Rather Than Value Prediction

Dopamine Modulates Adaptive Prediction Error Coding in the Human Midbrain and Striatum