Abstract:For over two decades, phasic activity in midbrain dopamine neurons was considered synonymous with the prediction error in temporal-difference reinforcement learning. 1 Schultz W. Dayan P. Montague P.R. A neural substrate of prediction and reward. Science. 1997; 275 : 1593-1599 https://doi.org/10.1126/science.275.5306.1593 Crossref PubMed Scopus (5669) Google Scholar , 2 Waelti P. Dickinson A. Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature. 2001; 412 : 43-48 https://doi.org/10.1038/35083500 Crossref PubMed Scopus (744) Google Scholar , 3 Glimcher P.W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl. Acad. Sci. USA. 2011; 108 : 15647-15654 https://doi.org/10.1073/pnas.1014269108 Crossref PubMed Scopus (456) Google Scholar , 4 Schultz W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci. 2016; 17 : 183-195 https://doi.org/10.1038/nrn.2015.26 Crossref PubMed Scopus (381) Google Scholar Central to this proposal is the notion that reward-predictive stimuli become endowed with the scalar value of predicted rewards. When these cues are subsequently encountered, their predictive value is compared to the value of the actual reward received, allowing for the calculation of prediction errors. 5 Sutton R.S. Barto A.G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 1981; 88 : 135-170 https://doi.org/10.1037/0033-295x.88.2.135 Crossref PubMed Scopus (0) Google Scholar , 6 Rescorla R.A. Wagner A.R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement.in: Black A. Prokasy W. Classical Conditioning II: Current Research and Theory. Appleton-Centrury-Crofts , 1972 : 64-99 Google Scholar Phasic firing of dopamine neurons was proposed to reflect this computation, 1 Schultz W. Dayan P. Montague P.R. A neural substrate of prediction and reward. Science. 1997; 275 : 1593-1599 https://doi.org/10.1126/science.275.5306.1593 Crossref PubMed Scopus (5669) Google Scholar , 2 Waelti P. <li class="lo -Abstract Truncated-

Dopamine transients delivered in learning contexts do not act as model-free prediction errors

Dopamine transients do not act as model-free prediction errors during associative learning

Dopamine transients encode reward prediction errors independent of learning rates

Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors

The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning

Dopamine neurons report an error in the temporal prediction of reward during learning

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework

Model-based predictions for dopamine

Dopamine transients follow a striatal gradient of reward time horizons

Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner

Dopamine reward prediction error coding

Mesolimbic dopamine encodes reward prediction errors independent of learning rates

[Multiple Dopamine Signals and Their Contributions to Reinforcement Learning]

Dopamine, Prediction Error and Beyond

Dopamine reward prediction error signal codes the temporal evaluation of a perceptual decision report

Dopamine, Inference, and Uncertainty

Dopamine neuron ensembles signal the content of sensory prediction errors