QDAP: Downsizing adaptive policy for cooperative multi-agent reinforcement learning

Zhitong Zhao,Ya Zhang,Siying Wang,Fan Zhang,Malu Zhang,Wenyu Chen
DOI: https://doi.org/10.1016/j.knosys.2024.111719
IF: 8.139
2024-04-01
Knowledge-Based Systems
Abstract:Existing multi-agent reinforcement learning methods employ a paradigm of centralized training with decentralized execution (CTDE) to learn cooperative policy among agents via coordination. However, within continuous destruction conditions, the inclusion of information from dead agents significantly undermines the ability to effectively learn cooperative policies in multi-agent systems. In this paper, we first analyze the bias introduced by dead agents under the CTDE paradigm and how it affects cooperation among agents. Following this, we propose q-learning based downsizing adaptive policy (QDAP) framework for cooperative multi-agent reinforcement learning. QDAP actively discerns relevant values from dead agents and utilizes an innovative approach to convert historical trajectories into weighting factors, thereby aiding remaining active agents in learning more appropriate cooperative policies. Moreover, we extend our proposed framework into the CTDE paradigm, facilitating seamless adaptation with the methods of value decomposition. Experimental results demonstrate that QDAP significantly improves learning speed and achieves superior cooperation performance on challenging Starcraft II micromanagement benchmark tasks.
computer science, artificial intelligence
What problem does this paper attempt to address?