APC: Predict Global Representation from Local Observation in Multi-Agent Reinforcement Learning

Xiaoyang Li,Guohua Yang,Dawei Zhang,Jianhua Tao
DOI: https://doi.org/10.1109/ijcnn60899.2024.10651100
2024-01-01
Abstract:Multi-agent reinforcement learning (MARL) algorithms with sequential decision-making strategies have achieved great success in cooperation tasks recently. To overcome the non-stationarity problem, these methods design a centralized controller that takes global observation as input and chooses actions for each agent in sequence. However, in most scenarios, global information is only available at training time, while agents act synchronously with their local observation at execution time, which prevents agents from leveraging more information in cooperation. In this paper, based on actor-critic architecture, we propose the actor-predicts-critic (APC) algorithm, in which the actor learns to predict the global representations of centralized critic from local observation. During the training, the actor not only receives the estimated state values, but also takes the critic’s representations that are extracted from global information as the prediction targets. Since these global representations are closely related to agents’ goals and rewards, agents can achieve better cooperation on MARL tasks utilizing the predicted representations. To prove the validity of APC, we evaluate the algorithm on StarCraft2, Google Research Football, and MultiAgent Mujoco benchmarks. The results show that APC significantly outperforms the strong baselines in centralized training and decentralized execution (CTDE) framework, including MATDec, MAPPO, and fine-tuned QMIX.
What problem does this paper attempt to address?