Abstract:The hypothesis that midbrain dopamine (DA) neurons broadcast an error for the prediction of reward (reward prediction error, RPE) is among the great successes of computational neuroscience . However, recent results contradict a core aspect of this theory: that the neurons uniformly convey a scalar, global signal. For instance, when animals are placed in a high-dimensional environment, DA neurons in the ventral tegmental area (VTA) display substantial heterogeneity in the features to which they respond, while also having more consistent RPE-like responses at the time of reward . We argue that the previously predominant family of extensions to the RPE model, which replicate the classic model in multiple parallel circuits, are ill-suited to explaining these and other results concerning DA heterogeneity within the VTA. Instead, we introduce a complementary “feature-specific RPE” model positing that DA neurons within VTA report individual RPEs for different elements of a population vector code for an animal’s state (moment-to-moment situation). To investigate this claim, we train a deep reinforcement learning model on a navigation and decision-making task and compare the feature-specific RPE derived from the network to population recordings from DA neurons during the same task. The model recapitulates key aspects of VTA DA neuron heterogeneity. Further, we show how our framework can be extended to explain patterns of heterogeneity in action responses reported among SNc DA neurons . Thus, our work provides a path to reconcile new observations of DA neuron heterogeneity with classic ideas about RPE coding, while also providing a new perspective on how the brain performs reinforcement learning in high dimensional environments.

A feature-specific prediction error model explains dopaminergic heterogeneity

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

Rethinking dopamine as generalized prediction error

Dopamine, Updated: Reward Prediction Error and Beyond

Dopamine transients encode reward prediction errors independent of learning rates

Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time

The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning

Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework

Mesolimbic dopamine encodes reward prediction errors independent of learning rates

Dopamine Prediction Errors in Reward Learning and Addiction: From Theory to Neural Circuitry

Model-based predictions for dopamine

Representation learning with reward prediction errors

Dopamine reward prediction error coding

The many worlds hypothesis of dopamine prediction error: implications of a parallel circuit architecture in the basal ganglia

Dopamine transients do not act as model-free prediction errors during associative learning

Model-based reward prediction in the primate prefrontal cortex

Encoding Motivation Prediction Errors in the Human Dopaminergic Reward System

Dopamine neurons report an error in the temporal prediction of reward during learning