Abstract:Reinforcement learning (RL) models are used extensively to study human behavior. These rely on normative models of behavior and stress interpretability over predictive capabilities. More recently, neural network models have emerged as a descriptive modeling paradigm that is capable of high predictive power yet with limited interpretability. Here, we seek to augment the expressiveness of theoretical RL models with the high flexibility and predictive power of neural networks. We introduce a novel framework, which we term theoretical-RNN (t-RNN), whereby a recurrent neural network is trained to predict trial-by-trial behavior and to infer theoretical RL parameters using artificial data of RL agents performing a two-armed bandit task. In three studies, we then examined the use of our approach to dynamically predict unseen behavior along with time-varying theoretical RL parameters. We first validate our approach using synthetic data with known RL parameters. Next, as a proof-of-concept, we applied our framework to two independent datasets of humans performing the same task. In the first dataset, we describe differences in theoretical RL parameters dynamic among clinical psychiatric vs. healthy controls. In the second dataset, we show that the exploration strategies of humans varied dynamically in response to task phase and difficulty. For all analyses, we found better performance in the prediction of actions for t-RNN compared to the stationary maximum-likelihood RL method. We discuss the use of neural networks to facilitate the estimation of latent RL parameters underlying choice behavior. Currently, neural network models fitted directly to behavioral human data are thought to dramatically outperform theoretical computational models in terms of predictive accuracy. However, these networks do not provide a clear theoretical interpretation of the mechanisms underlying the observed behavior. Generating plausible theoretical explanations for observed human data is a major goal in computational neuroscience. Here, we provide a proof-of-concept for a novel method where a recurrent neural network (RNN) is trained on artificial data generated from a known theoretical model to predict both trial-by-trial actions and theoretical parameters. We then freeze the RNN weights and use it to predict both actions and theoretical parameters of empirical data. We first validate our approach using synthetic data where the theoretical parameters are known. We then show, using two empirical datasets, that our approach allows dynamic estimation of latent parameters while providing better action predictions compared to theoretical models fitted with a maximum-likelihood approach. This proof-of-concept suggests that neural networks can be trained to predict meaningful time-varying theoretical parameters.

Probing relationships between reinforcement learning and simple behavioral strategies to understand probabilistic reward learning

Harnessing the flexibility of neural networks to predict dynamic theoretical parameters underlying human choice behavior

A Semiparametric Inverse Reinforcement Learning Approach to Characterize Decision Making for Mental Disorders

Modeling and Interpreting Real-world Human Risk Decision Making with Inverse Reinforcement Learning

Dissociable Neural Representations of Reinforcement and Belief Prediction Errors Underlie Strategic Learning

HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments

Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

Exploration in Model-based Reinforcement Learning with Randomized Reward

A novel technique for delineating the effect of variation in the learning rate on the neural correlates of reward prediction errors in model-based fMRI

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

Explaining Conditions for Reinforcement Learning Behaviors from Real and Imagined Data

Reinforcement Learning with Perturbed Rewards

Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

Look Around! Unexpected gains from training on environments in the vicinity of the target

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning.

Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

An opponent striatal circuit for distributional reinforcement learning

Foundations of Multivariate Distributional Reinforcement Learning