Abstract:In a multi-agent environment, In order to overcome and alleviate the non-stationarity of the multi-agent environment, the mainstream method is to adopt the framework of Centralized Training Decentralized Execution (CTDE). This thesis is based on the framework of CTDE, and studies the cooperative decision-making of multi-agent based on the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm for multi-agent proximal policy optimization. In order to alleviate the non-stationarity of the multi-agent environment, a multi-agent communication mechanism based on weight scheduling and attention module is introduced. Different agents can alleviate the non-stationarity caused by local observations through information exchange between agents, assisting in the collaborative decision-making of agents. The specific method is to introduce a communication module in the policy network part. The communication module is composed of a weight generator, a weight scheduler, a message encoder, a message pool and an attention module. Among them, the weight generator and weight scheduler will generate weights as the selection basis for communication, the message encoder is used to compress and encode communication information, the message pool is used to store communication messages, and the attention module realizes the interactive processing of the agent's own information and communication information. This thesis proposes a Multi-Agent Communication and Global Information Optimization Proximal Policy Optimization(MCGOPPO)algorithm, and conducted experiments in the SMAC and the MPE. The experimental results show that the improvement has achieved certain effects, which can better alleviate the non-stationarity of the multi-agent environment, and improve the collaborative decision-making ability among the agents.

What problem does this paper attempt to address?

This paper attempts to solve the problem of environmental non - stationarity in multi - agent environments caused by the continuous change of each agent's strategy, which poses a challenge to the cooperative decision - making among multiple agents. Specifically, the paper mainly solves the following problems: 1. **Introduction of multi - agent communication mechanism**: In order to alleviate the non - stationarity in multi - agent environments, the paper proposes a multi - agent communication mechanism based on weight scheduling and attention modules. Through information exchange and sharing among agents, the non - stationarity caused by local observations can be reduced, thus assisting in the collaborative decision - making among agents. This mechanism includes a communication selection module (message encoder, weight generator and weight scheduler) and a message processing module (attention module). 2. **Optimized processing of global information**: In the CTDE framework, global information is introduced in the centralized training phase to alleviate the non - stationarity of the environment. However, the MAPPO algorithm has certain redundancies when processing global information. For this reason, the paper proposes a global information optimization method based on the attention mechanism and deep - and - shallow feature processing. This method first simplifies the joint observation information and global information of all agents through the attention mechanism to remove redundant information, then deeply processes the information of enemy agents, shallowly processes the information of friendly agents and itself, and finally concatenates the processed features and inputs them into the centralized Critic network. Through the above improvements, the paper proposes the multi - agent communication and global information - optimized proximal policy optimization (MCGOPPO) algorithm and conducts experimental verification in the StarCraft Multi - Agent Challenge (SMAC) and the Multi - Agent Particle Environment (MPE). The experimental results show that these improvements can effectively alleviate the non - stationarity in multi - agent environments and improve the collaborative decision - making ability among agents.

Research on Multi-Agent Communication and Collaborative Decision-Making Based on Deep Reinforcement Learning

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together

Observer-Based Multiagent Deep Reinforcement Learning: A Fully Distributed Training Scheme

MAPPO method based on attention behavior network

Learning Effective Communication for Cooperative Pursuit with Multi-Agent Reinforcement Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Learning Multi-Agent Communication with Double Attentional Deep Reinforcement Learning

Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Communication-Efficient Cooperative Multi-Agent PPO via Regulated Segment Mixture in Internet of Vehicles

MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning

Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization

Learning Selective Communication for Multi-Agent Path Finding

MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation

Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Effective Master-Slave Communication On A Multi-Agent Deep Reinforcement Learning System

Optimistic sequential multi-agent reinforcement learning with motivational communication

JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning

[Development of specific immunotherapy technics in immediate hypersensitivity].