Research on Multi-Agent Communication and Collaborative Decision-Making Based on Deep Reinforcement Learning

Zeng Da
DOI: https://doi.org/10.48550/arXiv.2305.17141
2023-05-23
Abstract:In a multi-agent environment, In order to overcome and alleviate the non-stationarity of the multi-agent environment, the mainstream method is to adopt the framework of Centralized Training Decentralized Execution (CTDE). This thesis is based on the framework of CTDE, and studies the cooperative decision-making of multi-agent based on the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm for multi-agent proximal policy optimization. In order to alleviate the non-stationarity of the multi-agent environment, a multi-agent communication mechanism based on weight scheduling and attention module is introduced. Different agents can alleviate the non-stationarity caused by local observations through information exchange between agents, assisting in the collaborative decision-making of agents. The specific method is to introduce a communication module in the policy network part. The communication module is composed of a weight generator, a weight scheduler, a message encoder, a message pool and an attention module. Among them, the weight generator and weight scheduler will generate weights as the selection basis for communication, the message encoder is used to compress and encode communication information, the message pool is used to store communication messages, and the attention module realizes the interactive processing of the agent's own information and communication information. This thesis proposes a Multi-Agent Communication and Global Information Optimization Proximal Policy Optimization(MCGOPPO)algorithm, and conducted experiments in the SMAC and the MPE. The experimental results show that the improvement has achieved certain effects, which can better alleviate the non-stationarity of the multi-agent environment, and improve the collaborative decision-making ability among the agents.
Multiagent Systems,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of environmental non - stationarity in multi - agent environments caused by the continuous change of each agent's strategy, which poses a challenge to the cooperative decision - making among multiple agents. Specifically, the paper mainly solves the following problems: 1. **Introduction of multi - agent communication mechanism**: In order to alleviate the non - stationarity in multi - agent environments, the paper proposes a multi - agent communication mechanism based on weight scheduling and attention modules. Through information exchange and sharing among agents, the non - stationarity caused by local observations can be reduced, thus assisting in the collaborative decision - making among agents. This mechanism includes a communication selection module (message encoder, weight generator and weight scheduler) and a message processing module (attention module). 2. **Optimized processing of global information**: In the CTDE framework, global information is introduced in the centralized training phase to alleviate the non - stationarity of the environment. However, the MAPPO algorithm has certain redundancies when processing global information. For this reason, the paper proposes a global information optimization method based on the attention mechanism and deep - and - shallow feature processing. This method first simplifies the joint observation information and global information of all agents through the attention mechanism to remove redundant information, then deeply processes the information of enemy agents, shallowly processes the information of friendly agents and itself, and finally concatenates the processed features and inputs them into the centralized Critic network. Through the above improvements, the paper proposes the multi - agent communication and global information - optimized proximal policy optimization (MCGOPPO) algorithm and conducts experimental verification in the StarCraft Multi - Agent Challenge (SMAC) and the Multi - Agent Particle Environment (MPE). The experimental results show that these improvements can effectively alleviate the non - stationarity in multi - agent environments and improve the collaborative decision - making ability among agents.