Abstract:Abstract The dopaminergic reward system encoding the reward PE signals is vital for reinforcement learning (RL). Although this reward PE hypothesis has been extensively validated, it remains considerable debates on the alternative account of motivation. In the current study, we diverted the participants’ motivation from the conditioned stimulus (CS)-associated valences to the CS-elicited actions in a variant Pavlovian conditioning task under appetitive and aversive conditions. We found that the regions in the dopaminergic reward system did not encode such bidirectional reward PE signals, but the PE magnitudes, namely, the motivation PE signals. These neural signals without indicating the directions of learning could not be directly used for model-free RL, but probably for model-based control. Specifically, the ventral striatum during the feedback phase might encode the need of adjusting the learning policy, while the putative substantia nigra pars compacta (SNc) in the midbrain and the putamen during the prediction phase might sustain the intended actions. Meanwhile, the primary motor cortex encoded the salience PE signals for model-free RL. Therefore, our findings demonstrate that the human dopaminergic reward system could encode the motivation PE signals to substantialize model-based control, rather than model-free learning, suggesting that its involvement in RL should be motivation-dependent.

What is dopamine doing in model-based reinforcement learning?

Rethinking dopamine as generalized prediction error

Dopamine, Updated: Reward Prediction Error and Beyond

Model-based predictions for dopamine

Mesolimbic dopamine encodes reward prediction errors independent of learning rates

Representation learning with reward prediction errors

Dopamine reports reward prediction errors, but does not update policy, during inference-guided choice

[Multiple Dopamine Signals and Their Contributions to Reinforcement Learning]

Dopamine transients encode reward prediction errors independent of learning rates

Believing in dopamine

A dopamine mechanism for reward maximization

Dopamine Prediction Errors in Reward Learning and Addiction: From Theory to Neural Circuitry

Dopamine-independent effect of rewards on choices through hidden-state inference

Dopamine reward prediction error coding

Dopamine transients do not act as model-free prediction errors during associative learning

Encoding Motivation Prediction Errors in the Human Dopaminergic Reward System

Dopamine, Prediction Error and Beyond

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model