Abstract:Reinforcement Learning (RL), especially Deep Reinforcement Learning (DRL), has made great progress in many areas, such as robots, video games and driving. However, sample inefficiency is a big obstacle to the widespread practical application of DRL. Inspired by the decision making in human brain, this problem can be solved by incorporating instance based learning, i.e. episodic memory. Many episodic memory based RL algorithms have emerged recently. However, these algorithms either only replace parametric DRL algorithm with episodic control or incorporate episodic memory in a single component of DRL. In contrast to preview works, this paper proposes a new sample-efficient reinforcement learning architecture which introduces a new episodic memory module and incorporates episodic thought into some key components of DRL: exploration, experience replay and loss function. Taking Deep Q-Network (DQN) algorithm for example, when combined with DQN, our algorithm is called High Efficient Episodic Memory DQN (HE-EMDQN). In HE-EMDQN, a new non-parametric episodic memory module is introduced to help calculate the loss and modify the predicted value for exploration. For the sake of accelerating the sample learning in experience replay, an auxiliary small buffer called percentile best episode replay memory is designed to compose a mixed mini-batch. We show across the testing environments that our algorithm is significantly more powerful and sample-efficient than DQN and the recent episodic memory deep q-network (EMDQN). This work provides a new perspective for other RL algorithms to improve sample efficiency by utilising episodic memory efficiently.

Deep reinforcement learning via good choice resampling experience replay memory

A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning.

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Experience Selection In Multi-Agent Deep Reinforcement Learning

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Deep Q-Learning with Prioritized Sampling.

Sample Efficient Reinforcement Learning Method Via High Efficient Episodic Memory.

Prioritised Experience Replay Based on Sample Optimisation

High-Value Prioritized Experience Replay For Off-Policy Reinforcement Learning

Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

Prioritized Experience Replay

Sample-Efficient Deep Reinforcement Learning Via Balance Sample

Understanding the effect of varying amounts of replay per step

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Replay-enhanced Continual Reinforcement Learning

A Dual Memory Structure for Efficient Use of Replay Memory in Deep Reinforcement Learning

A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning.

Deep Q-learning Sampling Based on Advantages

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Sample Efficient Reinforcement Learning Using Graph-Based Memory Reconstruction.

Prioritized Experience Replay for Multi-agent Cooperation