MAPPO method based on attention behavior network

Haoyu Huang,Tianmao Chen,Huixia Wang,Ruiguang Hu,Wuyi Luo,Zheng Yao
DOI: https://doi.org/10.1109/ISCTech58360.2022.00054
2022-01-01
Abstract:In the research of multi-agent reinforcement learning algorithm, in order to accelerate the convergence of algorithm strategies, the agent actor network in MAPPO is improved by drawing on the attention mechanism in the Transformer model. The feature extraction method of the agent actor for environmental observations is changed from linear network learning to high-dimensional features of data extracted through attention mechanism to learn the sequence information between values under fixed attribute areas. The correlation of its own information, friendly information, and environmental information in the observed values is improved, and a clearer policy distribution is obtained, so as to improve the training efficiency in a fixed complexity environment, accelerate the strategic convergence of the agent, and obtain a reasonable multi-agent strategy. The modified multi-agent method is trained and tested in the SMAC multi-agent reinforcement learning environment. The final experimental results show that compared with the MAPPO algorithm, the strategy can converge faster, and the game strategy obtained by the final convergence can achieve an average win rate of 95%, which meets the performance requirements of the multi-agent algorithm.
What problem does this paper attempt to address?