A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning

Woojun Kim,Whiyoung Jung,Myungsik Cho,Youngchul Sung
2023-03-01
Abstract:In this paper, we propose a new mutual information framework for multi-agent reinforcement learning to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. The derived tractable objective can be interpreted as maximum entropy reinforcement learning combined with uncertainty reduction of other agents actions. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic, which follows centralized learning with decentralized execution. We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms other MARL algorithms in multi-agent tasks requiring high-quality coordination.
Multiagent Systems,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the coordination problem in multi - agent reinforcement learning (MARL). Specifically, the authors propose a new framework based on mutual information (MI) for multi - agent reinforcement learning, enabling multiple agents to learn coordinated behaviors by regularizing the simultaneous mutual information between the cumulative rewards and multi - agent actions. By introducing a latent variable to induce non - zero mutual information between multi - agent actions and applying the variational bound, a tractable lower bound of the considered MI - regularized objective function is derived. This method aims to overcome the problem of limited ability to learn coordinated behaviors in existing methods due to ignoring the influence of other agents, especially in cases where multiple agent actions need to be coordinated simultaneously. The main contribution of the paper is to propose a practical algorithm named Variational Maximum Mutual Information Multi - Agent Actor - Critic (VM3 - AC), which follows the principle of Centralized Training with Decentralized Execution (CTDE). Experimental results show that VM3 - AC outperforms other MARL algorithms in multi - agent tasks requiring high - quality coordination.