Abstract:The hypothesis that midbrain dopamine (DA) neurons broadcast an error for the prediction of reward (reward prediction error, RPE) is among the great successes of computational neuroscience . However, recent results contradict a core aspect of this theory: that the neurons uniformly convey a scalar, global signal. For instance, when animals are placed in a high-dimensional environment, DA neurons in the ventral tegmental area (VTA) display substantial heterogeneity in the features to which they respond, while also having more consistent RPE-like responses at the time of reward . We argue that the previously predominant family of extensions to the RPE model, which replicate the classic model in multiple parallel circuits, are ill-suited to explaining these and other results concerning DA heterogeneity within the VTA. Instead, we introduce a complementary “feature-specific RPE” model positing that DA neurons within VTA report individual RPEs for different elements of a population vector code for an animal’s state (moment-to-moment situation). To investigate this claim, we train a deep reinforcement learning model on a navigation and decision-making task and compare the feature-specific RPE derived from the network to population recordings from DA neurons during the same task. The model recapitulates key aspects of VTA DA neuron heterogeneity. Further, we show how our framework can be extended to explain patterns of heterogeneity in action responses reported among SNc DA neurons . Thus, our work provides a path to reconcile new observations of DA neuron heterogeneity with classic ideas about RPE coding, while also providing a new perspective on how the brain performs reinforcement learning in high dimensional environments.

Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning

Dopamine neurons report an error in the temporal prediction of reward during learning

Rethinking dopamine as generalized prediction error

Dopamine, Inference, and Uncertainty

Dopamine reward prediction error signal codes the temporal evaluation of a perceptual decision report

Model-based predictions for dopamine

Dopamine transients delivered in learning contexts do not act as model-free prediction errors

Dopamine transients do not act as model-free prediction errors during associative learning

Dopamine neuron ensembles signal the content of sensory prediction errors

Dopamine, Prediction Error and Beyond

Dopamine reward prediction error coding

Believing in dopamine

Dopamine Prediction Errors and the Relativity of Value

A feature-specific prediction error model explains dopaminergic heterogeneity

Mesolimbic dopamine encodes reward prediction errors independent of learning rates

Dopamine, Updated: Reward Prediction Error and Beyond