Abstract:Deep reinforcement learning has achieved significant success in various domains. However, it still faces a huge challenge when learning multiple tasks in sequence. This is because the interaction in a complex setting involves continual learning that results in the change in data distributions over time. A continual learning system should ensure that the agent acquires new knowledge without forgetting the previous one. However, catastrophic forgetting may occur as the new experience can overwrite previous experience due to limited memory size. The dual experience replay algorithm which retains previous experience is widely applied to reduce forgetting, but it cannot be applied in scalable tasks when the memory size is constrained. To alleviate the constrained by the memory size, we propose a new continual reinforcement learning algorithm called Self-generated Long-term Experience Replay (SLER). Our method is different from the standard dual experience replay algorithm, which uses short-term experience replay to retain current task experience, and the long-term experience replay retains all past tasks’ experience to achieve continual learning. In this paper, we first trained an environment sample model called Experience Replay Mode (ERM) to generate the simulated state sequence of the previous tasks for knowledge retention. Then combined the ERM with the experience of the new task to generate the simulation experience all previous tasks to alleviate forgetting. Our method can effectively decrease the requirement of memory size in multiple tasks, reinforcement learning. We show that our method in StarCraft II and the GridWorld environments performs better than the state-of-the-art deep learning method and achieve a comparable result to the dual experience replay method, which retains the experience of all the tasks.

Clustering experience replay for the effective exploitation in reinforcement learning

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Clustered Reinforcement Learning

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

Experience Selection In Multi-Agent Deep Reinforcement Learning

Replay across Experiments: A Natural Extension of Off-Policy RL

Parallel Curriculum Experience Replay in Distributed Reinforcement Learning.

Understanding the effect of varying amounts of replay per step

ACDER: Augmented Curiosity-Driven Experience Replay

Locality-Sensitive State-Guided Experience Replay Optimization for Sparse Rewards in Online Recommendation

An Improved Reinforcement Learning Method Based on Unsupervised Learning

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Prioritized Experience Replay

Brain-Like Replay Naturally Emerges in Reinforcement Learning Agents

Embedding Expert Demonstrations into Clustering Buffer for Effective Deep Reinforcement Learning

Advances in Experience Replay

Topological Experience Replay

SLER: Self-generated long-term experience replay for continual reinforcement learning

CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement Learning

Synthetic Experience Replay