Proximal Policy Optimization Based Decentralized Networked Multi-Agent Reinforcement Learning

Jinyi Liu,Fangyu Li,Jingjing Wang,Honggui Han
DOI: https://doi.org/10.1109/icca62789.2024.10591960
2024-01-01
Abstract:Networked multi-agent reinforcement learning (NMARL) is widely used in multi-agent systems (MASs). However, most existing NMARL algorithms share the global state and reward, which hinders their scalability in large-scale MASs. To make NMARL applicable to large-scale MASs, we propose a proximal policy optimization (PPO) based fully decentralized NMARL. First, we design a fully decentralized multi-agent reinforcement learning (MARL) framework, formulated as a networked partially observable multi-agent markov decision process (N-POMDP). The networked MAS is represented by a graph, where each agent communicates with and shares reward only with neighbors. Second, we design a gate recurrent unit (GRU) based communication strategy to learn the temporal communication correlation. Each agent exchanges observation information and hidden state with its neighbors. Finally, we conduct experiments using the multi-agent particle environment (MPE) and compare our algorithm with common MARL algorithms. The experimental outcomes reveal the exceptional performance of our algorithm in terms of both cumulative return and convergence speed within the extensive MAS.
What problem does this paper attempt to address?