Abstract:Reinforcement learning (RL) is a theoretical framework that describes how agents learn to select options that maximize rewards and minimize punishments over time. We often make choices, however, to obtain symbolic reinforcers (e.g. money, points) that are later exchanged for primary reinforcers (e.g. food, drink). Although symbolic reinforcers are ubiquitous in our daily lives, widely used in laboratory tasks because they can be motivating, mechanisms by which they become motivating are less understood. In the present study, we examined how monkeys learn to make choices that maximize fluid rewards through reinforcement with tokens. The question addressed here is how the value of a state, which is a function of multiple task features (e.g. current number of accumulated tokens, choice options, task epoch, trials since last delivery of primary reinforcer, etc.), drives value and affects motivation. We constructed a Markov decision process model that computes the value of task states given task features to then correlate with the motivational state of the animal. Fixation times, choice reaction times, and abort frequency were all significantly related to values of task states during the tokens task (n=5 monkeys, three male and two female). Furthermore, the model makes predictions for how neural responses could change on a moment-by-moment basis relative to changes in state value. Together, this task and model allow us to capture learning and behavior related to symbolic reinforcement. Significance statement Symbolic reinforcers, like money and points, play a critical role in our lives. However, we lack a mechanistic understanding of how symbolic reinforcement can be related to fluctuations in motivation. We investigated the effect of symbolic reinforcers on behaviors related to motivation during a token reinforcement learning task, using a novel reinforcement learning model and data from five monkeys. We designed a state-based model that can capture reward-predicting features and produce state values at sub-trial resolution. Our findings suggest that the value of a task state can affect willingness to initiate a trial, speed to choose, and persistence to complete a trial. Our model makes testable predictions for within trial fluctuations of neural activity related to values of task states.

Inverse Reinforcement Learning to Study Motivation in Mouse Behavioral Paradigms

An Evaluation Study of Intrinsic Motivation Techniques applied to Reinforcement Learning over Hard Exploration Environments

Modular inverse reinforcement learning for visuomotor behavior

Neural networks with motivation

Show me the Way: Intrinsic Motivation from Demonstrations

An Information-Theoretic Perspective on Intrinsic Motivation in Reinforcement Learning: A Survey

Beyond Winning and Losing: Modeling Human Motivations and Behaviors Using Inverse Reinforcement Learning

Neural activity ramps in frontal cortex signal extended motivation during learning

Novel and optimized mouse behavior enabled by fully autonomous HABITS: Home-cage Assisted Behavioral Innovation and Testing System

Intrinsic motivations and open-ended learning

Intrinsic Motivation in Model-based Reinforcement Learning: A Brief Review

Computational mechanisms underlying motivation to earn symbolic reinforcers

The Switchmaze: an open-design device for measuring motivation and drive switching in mice

Encoding Motivation Prediction Errors in the Human Dopaminergic Reward System

Validation of a touchscreen probabilistic reward task for mice: A reverse-translated assay with cross-species continuity

A unified strategy for implementing curiosity and empowerment driven reinforcement learning

Modeling sensory-motor decisions in natural behavior

Measuring Motivation and Reward‐Related Decision Making in the Rodent Operant Touchscreen System

Modeling Complex Animal Behavior with Latent State Inverse Reinforcement Learning

Neural and Computational Mechanisms of Motivation and Decision-making

Motif: Intrinsic Motivation from Artificial Intelligence Feedback