MAPPG: Multi-agent Phasic Policy Gradient

Qi Zhang,Xuetao Zhang,Yisha Liu,Xuebo Zhang,Yan Zhuang
DOI: https://doi.org/10.1109/CDC49753.2023.10383338
2023-01-01
Abstract:We propose a Multi-Agent Phasic Policy Gradient (MAPPG) algorithm, which can assist agents to further alleviate the non-stationarity of the environment. Different from the existing methods, the auxiliary phase is introduced to train the local policy directly by using the environment state, which can be naturally integrated into other algorithms. Specifically, the hidden layer feature sharing is proposed, which ensures feature sharing between the global value network and the local policy network for the first time. Meanwhile, mirror descent is utilized to iteratively update the policy in the auxiliary stage, which makes the policy update more robust. Through a series of evaluations on multi-agent Particle and multi-agent Mujoco benchmark environments, the experimental results show that our method achieves higher rewards than state-of-the-art benchmarks.
What problem does this paper attempt to address?