Multi-agent collaboration based on RGMAAC algorithm under partial observability
WANG Zi-hao,ZHANG Yan-xin,HUANG Zhi-qing,YIN Chen-kun
DOI: https://doi.org/10.13195/j.kzyjc.2022.0422
2023-01-01
Abstract:Multi-agent deep reinforcement learning(MADRL) applies the ideas and algorithms of deep reinforcement learning to the learning and control of multi-agent systems, which is an important method to develop multi-agent systems with swarm agents. Existing MADRL studies mainly design algorithms based on the assumption that the environment is completely observable or communication resources are not limited. However, partial observability is an objective problem in the practical application of multi-agent systems. For example, the observation range of agentsis is usually limited, and the complete environmental information is not included outside the observable range, which makes it difficult for multiagent collaboration. Aiming at the problem of partial observability in real scenes, based on the paradigm of centralized training and distributed execution, this paper extends the deep reinforcement learning algorithm Actor-Critic to multiagent systems and adds communication channels and gating mechanisms between agents, finally proposes a recurrent gated multi-agent Actor-Critic(RGMAAC) algorithm. Agents can communicate efficiently based on the historical action observation sequence, and finally use the local observation, the historical observation sequence and observations shared by other agents through communication channels to make behavior decisions. Meanwhile, based on the multi-agent particle environment, the multi-agent task of synchronous and fast arrival is designed, and two reward value functions and task scenarios are designed respectively. The experimental results show that the trained agent with the RGMAAC algorithm performs well and is superior to the baseline algorithm in terms of stability when some observable problems clearly appear in the task scenario.