A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning

Woojun Kim,Whiyoung Jung,Myungsik Cho,Youngchul Sung

2023-03-01

Abstract:In this paper, we propose a new mutual information framework for multi-agent reinforcement learning to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. The derived tractable objective can be interpreted as maximum entropy reinforcement learning combined with uncertainty reduction of other agents actions. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic, which follows centralized learning with decentralized execution. We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms other MARL algorithms in multi-agent tasks requiring high-quality coordination.

Multiagent Systems,Artificial Intelligence

What problem does this paper attempt to address?

This paper attempts to solve the coordination problem in multi - agent reinforcement learning (MARL). Specifically, the authors propose a new framework based on mutual information (MI) for multi - agent reinforcement learning, enabling multiple agents to learn coordinated behaviors by regularizing the simultaneous mutual information between the cumulative rewards and multi - agent actions. By introducing a latent variable to induce non - zero mutual information between multi - agent actions and applying the variational bound, a tractable lower bound of the considered MI - regularized objective function is derived. This method aims to overcome the problem of limited ability to learn coordinated behaviors in existing methods due to ignoring the influence of other agents, especially in cases where multiple agent actions need to be coordinated simultaneously. The main contribution of the paper is to propose a practical algorithm named Variational Maximum Mutual Information Multi - Agent Actor - Critic (VM3 - AC), which follows the principle of Centralized Training with Decentralized Execution (CTDE). Experimental results show that VM3 - AC outperforms other MARL algorithms in multi - agent tasks requiring high - quality coordination.

A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning

Mutual-Information Regularized Multi-Agent Policy Iteration.

Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

A Cooperative Multi-Agent Reinforcement Learning Method Based on Coordination Degree

Coordination Between Individual Agents in Multi-Agent Reinforcement Learning.

Multi-agent Cooperative Strategy with Explicit Teammate Modeling and Targeted Informative Communication

Coordination as inference in multi-agent reinforcement learning

Bi-Level Actor-Critic for Multi-Agent Coordination.

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

Credible Negotiation for Multi-agent Reinforcement Learning in Long-term Coordination

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

Multi-Agent Incentive Communication via Decentralized Teammate Modeling

PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration.

Multi-Agent Cooperation via Unsupervised Learning of Joint Intentions

Resolving Implicit Coordination in Multi-Agent Deep Reinforcement Learning with Deep Q-Networks & Game Theory

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

Variational Inequality Methods for Multi-Agent Reinforcement Learning: Performance and Stability Gains

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

Priority over Quantity: A Self-Incentive Credit Assignment Scheme for Cooperative Multiagent Reinforcement Learning