Efficient Multi-Agent Exploration with Mutual-Guided Actor-Critic

Renlong Chen,Ying Tan
DOI: https://doi.org/10.1109/CEC53210.2023.10254169
2023-01-01
Abstract:Multi-agent Reinforcement Learning (MARL) has drawn wide attention since a bunch of real-world complex scenes can be abstracted as Multi-Agent Systems. In order to solve the non-local training objective problem in shared reward environments, value-decomposition-based methods were proposed. Most of them introduce priori Individual-Global-Max (IGM) and value-decomposition constraints. Some attempts tune the value-decomposition constraints to achieve a better performance. However, IGM constraint, the fundamental assumption of value-decomposition methods, is adopted in most value-decomposition methods, which may lead to poor exploration in some situations. To deal with this problem, a novel algorithm called Mutual-guided Multi-agent Actor-Critic (MugAC) is proposed in this paper. MugAC, inspired by the core idea of evolutionary computation, imposes a joint-action pool, from which a joint-action is selected by the critic to interact with the environment and as a training objective of the actor. The training paradigm of MugAC provides an off-policy training for actor-critic, making the sample efficiency higher than that of traditional actor-critic methods in MARL. We evaluate our method against the state-of-the-art methods in StarCraft micromanagement. Experimental results show that MugAC outperforms other methods in various scenarios of widely adopted StarCraft Multi-Agent Challenge (SMAC).
What problem does this paper attempt to address?