Abstract:For over two decades, phasic activity in midbrain dopamine neurons was considered synonymous with the prediction error in temporal-difference reinforcement learning. 1 Schultz W. Dayan P. Montague P.R. A neural substrate of prediction and reward. Science. 1997; 275 : 1593-1599 https://doi.org/10.1126/science.275.5306.1593 Crossref PubMed Scopus (5669) Google Scholar , 2 Waelti P. Dickinson A. Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature. 2001; 412 : 43-48 https://doi.org/10.1038/35083500 Crossref PubMed Scopus (744) Google Scholar , 3 Glimcher P.W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl. Acad. Sci. USA. 2011; 108 : 15647-15654 https://doi.org/10.1073/pnas.1014269108 Crossref PubMed Scopus (456) Google Scholar , 4 Schultz W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci. 2016; 17 : 183-195 https://doi.org/10.1038/nrn.2015.26 Crossref PubMed Scopus (381) Google Scholar Central to this proposal is the notion that reward-predictive stimuli become endowed with the scalar value of predicted rewards. When these cues are subsequently encountered, their predictive value is compared to the value of the actual reward received, allowing for the calculation of prediction errors. 5 Sutton R.S. Barto A.G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 1981; 88 : 135-170 https://doi.org/10.1037/0033-295x.88.2.135 Crossref PubMed Scopus (0) Google Scholar , 6 Rescorla R.A. Wagner A.R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement.in: Black A. Prokasy W. Classical Conditioning II: Current Research and Theory. Appleton-Centrury-Crofts , 1972 : 64-99 Google Scholar Phasic firing of dopamine neurons was proposed to reflect this computation, 1 Schultz W. Dayan P. Montague P.R. A neural substrate of prediction and reward. Science. 1997; 275 : 1593-1599 https://doi.org/10.1126/science.275.5306.1593 Crossref PubMed Scopus (5669) Google Scholar , 2 Waelti P. <li class="lo -Abstract Truncated-

Representation learning with reward prediction errors

Rethinking dopamine as generalized prediction error

The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning

Dopamine reward prediction error coding

Dopamine, Prediction Error and Beyond

Dopamine neurons report an error in the temporal prediction of reward during learning

Believing in dopamine

Dopamine, Updated: Reward Prediction Error and Beyond

Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time

Dopamine reward prediction error signal codes the temporal evaluation of a perceptual decision report

Dopamine, Inference, and Uncertainty

Representation and Timing in Theories of the Dopamine System

Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner

Model-based predictions for dopamine

Mesolimbic dopamine encodes reward prediction errors independent of learning rates

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

[Multiple Dopamine Signals and Their Contributions to Reinforcement Learning]

A feature-specific prediction error model explains dopaminergic heterogeneity