Prioritized Experience Replay in Multi-Actor-Attention-Critic for Reinforcement Learning

Sheng Fan,Guanghua Song,Bowei Yang,Xiaohong Jiang
DOI: https://doi.org/10.1088/1742-6596/1631/1/012040
2020-01-01
Journal of Physics Conference Series
Abstract:Experience replay is a significant method of off-policy reinforcement learning (RL), which makes RL reuse the past experience and reduce the correlation between samples. Multi-Actor-Attention-Critic (MAAC) is a successful off-policy multi-agent reinforcement learning algorithm, due to its good scalability. To accelerate convergence, we use prioritized experience replay (PER) to optimize the experience selection in MAAC, and propose the PER-MAAC algorithm. In the PER-MAAC, the priority metric is based on the temporal-difference error during training. The algorithm is evaluated in the scenarios of Multi-UAV Cooperative Navigation and Rover-Tower. The experimental results show that PER-MAAC improves the speed of convergence.
What problem does this paper attempt to address?