Abstract:Replaying past experiences has proven to be a highly effective approach for averting catastrophic forgetting in supervised continual learning. However, some crucial factors are still largely ignored, making it vulnerable to serious failure, when used as a solution to forgetting in continual reinforcement learning, even in the context of perfect memory where all data of previous tasks are accessible in the current task. On the one hand, since most reinforcement learning algorithms are not invariant to the reward scale, the previously well-learned tasks (with high rewards) may appear to be more salient to the current learning process than the current task (with small initial rewards). This causes the agent to concentrate on those salient tasks at the expense of generality on the current task. On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting. In this paper, we introduce RECALL, a replay-enhanced method that greatly improves the plasticity of existing replay-based methods on new tasks while effectively avoiding the recurrence of catastrophic forgetting in continual reinforcement learning. RECALL leverages adaptive normalization on approximate targets and policy distillation on old tasks to enhance generality and stability, respectively. Extensive experiments on the Continual World benchmark show that RECALL performs significantly better than purely perfect memory replay, and achieves comparable or better overall performance against state-of-the-art continual learning methods.

Understanding the effect of varying amounts of replay per step

Replay across Experiments: A Natural Extension of Off-Policy RL

High-Value Prioritized Experience Replay For Off-Policy Reinforcement Learning

DQN with model-based exploration: efficient learning on environments with sparse rewards

An Approach to Optimize Replay Buffer in Value-Based Reinforcement Learning.

A model of hippocampal replay driven by experience and environmental structure facilitates spatial learning

Investigating the Interplay of Prioritized Replay and Generalization

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Deep reinforcement learning via good choice resampling experience replay memory

Prioritized Experience Replay

A Benchmark and Empirical Analysis for Replay Strategies in Continual Learning

Watch Your Step: Optimal Retrieval for Continual Learning at Scale

Ddper - Decentralized Distributed Prioritized Experience Replay.

Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Replay-enhanced Continual Reinforcement Learning

Reverse Experience Replay

Prioritised Experience Replay Based on Sample Optimisation

Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics