Abstract:How do humans and animals perform trial-and-error learning when the space of possibilities is infinite? In a previous study, we used an interval timing production task and discovered an updating strategy in which the agent adjusted the behavioral and neuronal noise for exploration. In the experiment, human subjects proactively generated a series of timed motor outputs. We found that the sequential motor timing varied at two temporal scales: long-term correlation around the target interval due to memory drifts and short-term adjustments of timing variability according to feedback. We have previously described these features of timing variability with an augmented Gaussian process, termed reward sensitive Gaussian process (RSGP). Here we provide a mechanistic model and simulate the process by borrowing the architecture of recurrent neural networks. While recurrent connection provided the long-term serial correlation in motor timing, to facilitate reward-driven short-term variations, we introduced reward-dependent variability in the network connectivity, inspired by the stochastic nature of synaptic transmission in the brain. Our model was able to recursively generate an output sequence incorporating the internal variability and external reinforcement in a Bayesian framework. We show that the model can learn the key features of human behavior. Unlike other neural network models that search for unique network connectivity for the best match between the model prediction and observation, this model can estimate the uncertainty associated with each outcome and thus did a better job in teasing apart adjustable task-relevant variability from unexplained variability. The proposed artificial neural network model parallels the mechanisms of information processing in neural systems and can extend the framework of brain-inspired reinforcement learning in continuous state control.

Time-scale invariant contingency yields one-shot reinforcement learning despite extremely long delays to reinforcement

Rapid learning of temporal dependencies at multiple timescales

Short-term Memory Traces for Action Bias in Human Reinforcement Learning

Learning about reward identities and time

Modeling time perception in rats: Evidence for catastrophic interference in animal learning

The role of prospective contingency in the control of behavior and dopamine signals during associative learning

Continual Reinforcement Learning with Multi-Timescale Successor Features

Multi-timescale nexting in a reinforcement learning robot

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning.

A neural network model for timing control with reinforcement

Provable Reinforcement Learning with a Short-Term Memory

A shared temporal window of integration across cognitive control and reinforcement learning paradigms: A correlational study

Demystifying the Recency Heuristic in Temporal-Difference Learning

Estimating scale-invariant future in continuous time

Reward timescale controls the rate of behavioral and dopaminergic learning

Temporal-Difference Learning Using Distributed Error Signals

One-shot learning for the long term: consolidation with an artificial hippocampal algorithm

Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time

Short-term plasticity as cause-effect hypothesis testing in distal reward learning

Expected reward value and reward prediction errors reinforce but also interfere with human time perception

Learning depends on the information conveyed by temporal relationships between events and is reflected in the dopamine response to cues