Multi-agent Cooperative Games Using Belief Map Assisted Training

Qinwei Huang,Chen Luo,Alex B. Wu,Simon Khan,Hai Li,Qinru Qiu
DOI: https://doi.org/10.3233/FAIA230444
2024-06-28
Abstract:In a multi-agent system, agents share their local observations to gain global situational awareness for decision making and collaboration using a message passing system. When to send a message, how to encode a message, and how to leverage the received messages directly affect the effectiveness of the collaboration among agents. When training a multi-agent cooperative game using reinforcement learning (RL), the message passing system needs to be optimized together with the agent policies. This consequently increases the model's complexity and poses significant challenges to the convergence and performance of learning. To address this issue, we propose the Belief-map Assisted Multi-agent System (BAMS), which leverages a neuro-symbolic belief map to enhance training. The belief map decodes the agent's hidden state to provide a symbolic representation of the agent's understanding of the environment and other agent's status. The simplicity of symbolic representation allows the gathering and comparison of the ground truth information with the belief, which provides an additional channel of feedback for the learning. Compared to the sporadic and delayed feedback coming from the reward in RL, the feedback from the belief map is more consistent and reliable. Agents using BAMS can learn a more effective message passing network to better understand each other, resulting in better performance in a cooperative predator and prey game with varying levels of map complexity and compare it to previous multi-agent message passing models. The simulation results showed that BAMS reduced training epochs by 66\%, and agents who apply the BAMS model completed the game with 34.62\% fewer steps on average.
Multiagent Systems,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of how to optimize the message-passing mechanism among agents in a multi-agent system to improve decision-making efficiency and collaboration in cooperative games. Specifically, the paper focuses on when agents should send messages, how to encode messages, and how to utilize received messages in multi-agent reinforcement learning (MARL). These factors directly affect the effectiveness of collaboration among agents. The optimization of the message-passing system needs to be synchronized with the agents' strategies, which increases the complexity of the model and poses challenges to the convergence and performance of learning. To address these issues, the authors propose a Belief-map Assisted Multi-agent System (BAMS). BAMS enhances the training process by introducing a neural-symbolic belief map that can decode the hidden states of agents, providing a symbolic representation of the agents' understanding of the environment and other agents' states. The simplicity of this symbolic representation allows for the comparison of real information with beliefs, providing additional feedback channels for learning. Compared to rewards in reinforcement learning, feedback from the belief map is more consistent and reliable, helping agents learn more effective message-passing networks and better understand each other, thus performing better in games. The main contributions of the paper include: 1. Proposing a belief-map assisted training mechanism that supplements reinforcement learning with supervised information, accelerating training convergence. 2. Designing a belief map decoder that reconstructs a neural-symbolic map from environmental embeddings, providing additional feedback for training. This map translates the hidden states of agents into a human-readable format, significantly improving the interpretability of the agents' decision-making process. 3. Training agents using the BAMS model to communicate more effectively, capture prey faster, and be less sensitive to external interference as the number of agents increases. 4. Simulation results show that agents with these improvements can be effectively trained in large and complex environments, reducing training time by an average of 66% and improving overall performance by 34.62%.