Abstract:Recent experimental and theoretical work on reinforcement learning has shed light on the neural bases of learning from rewards and punishments. One fundamental problem in reinforcement learning is the credit assignment problem, or how to properly assign credit to actions that lead to reward or punishment following a delay. Temporal difference learning solves this problem, but its efficiency can be significantly improved by the addition of eligibility traces (ET). In essence, ETs function as decaying memories of previous choices that are used to scale synaptic weight changes. It has been shown in theoretical studies that ETs spanning a number of actions may improve the performance of reinforcement learning. However, it remains an open question whether including ETs that persist over sequences of actions allows reinforcement learning models to better fit empirical data regarding the behaviors of humans and other animals. Here, we report an experiment in which human subjects performed a sequential economic decision game in which the long-term optimal strategy differed from the strategy that leads to the greatest short-term return. We demonstrate that human subjects' performance in the task is significantly affected by the time between choices in a surprising and seemingly counterintuitive way. However, this behavior is naturally explained by a temporal difference learning model which includes ETs persisting across actions. Furthermore, we review recent findings that suggest that short-term synaptic plasticity in dopamine neurons may provide a realistic biophysical mechanism for producing ETs that persist on a timescale consistent with behavioral observations.

Demystifying the Recency Heuristic in Temporal-Difference Learning

Predicting Periodicity with Temporal Difference Learning

Temporal Difference Learning with Experience Replay

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

Revisiting a Design Choice in Gradient Temporal Difference Learning

On the Statistical Benefits of Temporal Difference Learning

Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning

Almost Sure Convergence of Average Reward Temporal Difference Learning

The surprising efficiency of temporal difference learning for rare event prediction

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

Temporal-Difference Learning Using Distributed Error Signals

Statistical Efficiency of Distributional Temporal Difference Learning

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

Per-decision Multi-step Temporal Difference Learning with Control Variates

An Analysis of Quantile Temporal-Difference Learning

Is Temporal Difference Learning Optimal? an Instance-Dependent Analysis

Reanalysis of Variance Reduced Temporal Difference Learning

Short-term Memory Traces for Action Bias in Human Reinforcement Learning

Why Target Networks Stabilise Temporal Difference Methods

Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective