Unlocking the Potential of MAPPO with Asynchronous Optimization

Wei Fu,Chao Yu,Yunfei Li,Yi Wu
DOI: https://doi.org/10.1007/978-3-030-93049-3_33
IF: 14.4
2021-01-01
Artificial Intelligence
Abstract:It almost reaches a consensus that off-policy algorithms dominated research benchmarks of multi-agent reinforcement learning, while recent work [34] demonstrates that on-policy MARL algorithm, Multi-Agent Proximal Policy Optimization (MAPPO), can also attain comparable performance. In this paper, we propose a training framework based on MAPPO, named async-MAPPO, which supports scalable asynchronous training. We further re-examine async-MAPPO in StarCraftII micromanagement domain and obtain state-of-the-art performances on several hard and super-hard maps. Finally, we analyze three experimental phenomena and provide hypotheses behind the performance improvement of async-MAPPO.
computer science, artificial intelligence
What problem does this paper attempt to address?