Continual Reinforcement Learning with Multi-Timescale Successor Features

Doina Precup,Raymond Chua,Christos Kaplanis,B. Richards
DOI: https://doi.org/10.32470/ccn.2022.1229-0
Abstract:Learning and memory consolidation in the brain occur over multiple timescales. Inspired by this observation, it has been shown that catastrophic forgetting in reinforcement learning (RL) agents can be mitigated by consolidating Q-value function parameters at multiple timescales. In this work, we combine this approach with successor features, and show that by consolidating successor features and preferences learned over multiple timescales we can further mitigate catastrophic forgetting. In particular, we show that agents trained with this approach rapidly recall previously rewarding sites in large environments, whereas those trained without this decomposition and consolidation mechanism do not. These results therefore contribute to our understanding of the functional role of synaptic plasticity and memory systems operating at multiple timescales, and demonstrate that RL can be improved by capturing features of biological memory with greater fidelity.
Computer Science
What problem does this paper attempt to address?