Abstract:Experience replay is a crucial technology for off-policy deep reinforcement learning, which uses a portion of memory as the replay buffer to store previous experience samples for the later policy update. Since each experience sample can be used multiple times, experience replay drastically improves the utilization rate of experience samples. However, how to effectively combine experience replay with multi-agent reinforcement learning is still an open challenge. In multi-agent reinforcement learning, the decision of the agent needs to consider the dynamic information of the environment as well as the behavior of other agents. If the policies of other agents change, updating the current policy with previous experience samples may deteriorate the agents' subsequent decisions. Some methods use a small-capacity replay buffer to store recent experience samples. Although this avoids the problem that the experience sample in the replay buffer is not compatible with the current policy, it will reduce the diversity of experience samples in the replay buffer, that resulting in agents unable to learn the optimal strategy. This paper eases this conflict by enhancing the experience selection mechanism: 1) we use the reservoir retention algorithm to increase the diversity of experience samples in the replay buffer; 2) we use prioritized experience replay to alleviate the problem that the experience sample in the replay buffer is not compatible with the current policy. The experimental results on the covert communication problem confirm the effectiveness of our proposed method.

Joint Action Representation and Prioritized Experience Replay for Reinforcement Learning in Large Discrete Action Spaces.

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Ddper - Decentralized Distributed Prioritized Experience Replay.

Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

Prioritized Experience Replay in Multi-Actor-Attention-Critic for Reinforcement Learning

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning

Experience Selection In Multi-Agent Deep Reinforcement Learning

DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces

No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning

A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems

Learning Action Representations for Reinforcement Learning

Prioritized experience replay based on dynamics priority

Action Pick-up in Dynamic Action Space Reinforcement Learning

Exploration and Regularization of the Latent Action Space in Recommendation

Topological Experience Replay

Exact Reduction of Huge Action Spaces in General Reinforcement Learning

Prioritized Experience Replay

Learning Sparse Representations Incrementally in Deep Reinforcement Learning

Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces