Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory

Wenjing Ni,Bo Wang,Hua Zhong,Xiang Guo
DOI: https://doi.org/10.1109/cac57257.2022.10055953
2022-01-01
Abstract:Multi-agent reinforcement learning (MARL) has been plagued by low sample efficiency. It needs far more samples than human learning to achieve convergence and learn successful strategies. And this situation is more serious in continuous state and policy space. Episodic memory (EM), as an effective method to improve the sample efficiency of reinforcement learning (RL) by imitating the ability of human rapid learning, has currently made little effort in continuous policy space and MARL. Therefore, we propose a continuous policy multi-agent reinforcement learning method with generalizable episodic memory (ECM). It establishes a centralized memory parameter network and memory buffer for each agent, and updates memory through implicit planning, so that the episodic memory model can use neural networks to learn successful strategies from the past successful experience. Thus, the model can adapt to the continuous policy space. Moreover, ECM combines MARL's idea of decentralized execution and centralized training (CTDE) with episodic memory model to make the model adapt to multi-agent task environment. Simulation results show that ECM method can effectively improve the sample efficiency of MARL algorithm, and the learned strategy has higher accuracy.
What problem does this paper attempt to address?