Abstract:It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced.

A reinforcement learning model with choice traces for a progressive ratio schedule

STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization

Validation and Optimisation of a Touchscreen Progressive Ratio Test of Motivation in Male Rats

Behavioral Representation of Cost and Benefit Balance in Rats

A mismatch between striatal cholinergic pauses and dopaminergic reward prediction errors

Dissociable Effects of D-Amphetamine, Chlordiazepoxide and Alpha-Flupenthixol on Choice and Rate Measures of Reinforcement in the Rat.

Computational modelling reveals contrasting effects on reinforcement learning and cognitive flexibility in stimulant use disorder and obsessive-compulsive disorder: remediating effects of dopaminergic D2/3 receptor agents

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning.

The role of reinforcement learning in shaping the decision policy in methamphetamine use disorders

The Effects of D -Amphetamine, Chlordiazepoxide, Α-Flupenthixol and Behavioural Manipulations on Choice of Signalled and Unsignalled Delayed Reinforcement in Rats

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Dopamine Release Plateau and Outcome Signals in Dorsal Striatum Contrast with Classic Reinforcement Learning Formulations

Change point estimation by the mouse medial frontal cortex during probabilistic reward learning

Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation

An operant social self-administration and choice model in rats

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

Tracking subjects’ strategies in behavioural choice experiments at trial resolution

Modeling Psychological Refractory Period (PRP) and Practice Effect on PRP with Queuing Networks and Reinforcement Learning Algorithms

Dopamine reports reward prediction errors, but does not update policy, during inference-guided choice

Computational modelling reveals contrasting effects on reinforcement learning and cognitive flexibility in stimulant use disorder and obsessive compulsive disorder: remediating …

HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments