Parallelized Synchronous Multi-agent Deep Reinforcement Learning with Experience Replay Memory

Xudong Gong,Bo Ding,Jie Xu,Huaimin Wang,Xing Zhou,Dawei Feng
DOI: https://doi.org/10.1109/sose.2019.00055
2019-01-01
Abstract:In the field of deep reinforcement learning, reusing training data from different time-steps by introducing the Experience Replay Memory (ERM) can significantly reduce non-stationarity and decorrelate updates. However, in Multi-Agent Deep Reinforcement Learning (MA-DRL), in particular, the independent Q-Learning problems, it may introduce obsolete experiences to the training process, which inevitalbiy leads that the policy network takes more training times to converge. In this paper, we propose an approach which can exploit parallelization in resource-rich environments (e.g., cloud) to generate decorrelated training data instead of purely relying on ERM in MA-DRL. We enhance the synchronous method, a parallel method originally proposed for single-agent DRL, to the multi-agent environment. And to avoid the drawback of synchronous method, we combine it with ERM together and apply the improved method to lenient deep reinforcement learning. As a result, our algorithm, the lenient ERM-helped synchronous n-step deep Q-network (LESnDQN), takes the advantage of both ERM and parallelization. The LESnDQN algorithm has been tested in an extended variation of the coordinated multi-agent object transportation problem. The results show that LESnDQN takes less training episodes and computation time to converge than the state-of-the-art algorithm with similar parameter settings.
What problem does this paper attempt to address?