Abstract:The recently developed 'two-step' behavioural task promises to differentiate model-based from model-free reinforcement learning, while generating neurophysiologically-friendly decision datasets with parametric variation of decision variables. These desirable features have prompted its widespread adoption. Here, we analyse the interactions between a range of different strategies and the structure of transitions and outcomes in order to examine constraints on what can be learned from behavioural performance. The task involves a trade-off between the need for stochasticity, to allow strategies to be discriminated, and a need for determinism, so that it is worth subjects' investment of effort to exploit the contingencies optimally. We show through simulation that under certain conditions model-free strategies can masquerade as being model-based. We first show that seemingly innocuous modifications to the task structure can induce correlations between action values at the start of the trial and the subsequent trial events in such a way that analysis based on comparing successive trials can lead to erroneous conclusions. We confirm the power of a suggested correction to the analysis that can alleviate this problem. We then consider model-free reinforcement learning strategies that exploit correlations between where rewards are obtained and which actions have high expected value. These generate behaviour that appears model-based under these, and also more sophisticated, analyses. Exploiting the full potential of the two-step task as a tool for behavioural neuroscience requires an understanding of these issues.

Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

Combating the Compounding-Error Problem with a Multi-step Model

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task

Deep Model-Based Reinforcement Learning for High-Dimensional Problems, a Survey

Model-Based Reinforcement Learning via Meta-Policy Optimization

Multi-View Reinforcement Learning

A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning

Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects

Model-Based Bayesian Reinforcement Learning in Large Structured Domains

VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control

Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

Context-aware Active Multi-Step Reinforcement Learning

Models As Agents: Optimizing Multi-Step Predictions of Interactive Local Models in Model-Based Multi-Agent Reinforcement Learning

Multi-task learning with deep model based reinforcement learning

Model-Based Reinforcement Learning for Atari

Model-based reinforcement learning with dimension reduction

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Model predictive control-based value estimation for efficient reinforcement learning