Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Dingyang Chen,Qi Zhang

2023-06-03

Abstract:Executing actions in a correlated manner is a common strategy for human coordination that often leads to better cooperation, which is also potentially beneficial for cooperative multi-agent reinforcement learning (MARL). However, the recent success of MARL relies heavily on the convenient paradigm of purely decentralized execution, where there is no action correlation among agents for scalability considerations. In this work, we introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy. Theoretically, we establish a theoretical justification for why action dependencies are beneficial by deriving the multi-agent policy gradient formula under such a Bayesian network joint policy and proving its global convergence to Nash equilibria under tabular softmax policy parameterization in cooperative Markov games. Further, by equipping existing MARL algorithms with a recent method of differentiable directed acyclic graphs (DAGs), we develop practical algorithms to learn the context-aware Bayesian network policies in scenarios with partial observability and various difficulty. We also dynamically decrease the sparsity of the learned DAG throughout the training process, which leads to weakly or even purely independent policies for decentralized execution. Empirical results on a range of MARL benchmarks show the benefits of our approach.

Multiagent Systems,Artificial Intelligence,Computer Science and Game Theory,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to introduce action correlations between agents while maintaining the scalability of the algorithm in Cooperative Multi - Agent Reinforcement Learning (MARL). Specifically, traditional MARL methods usually adopt independent policies (that is, each agent independently selects actions based on its own observations). Although this approach is convenient for handling large - scale agent systems, it ignores the potentially beneficial collaborative relationships between agents. The paper proposes to establish the correlation of action selection among agents by introducing Bayesian Network (BN), thereby introducing dependencies in the joint policy to improve the cooperation effect. The main contributions of the paper include: 1. **Theoretical contribution**: The author formalizes the method of using Bayesian Network as a joint policy in the framework of cooperative Markov games, derives the corresponding BN policy gradient formula, and proves the global convergence to Nash equilibrium under tabular - type policy parameterization. 2. **Practical algorithm**: The author proposes an improved method based on existing MARL algorithms (such as MAPPO). By introducing differentiable Directed Acyclic Graph (DAG) learning techniques to dynamically adjust the sparsity of DAG, a fully decentralized execution policy is finally achieved, making the algorithm compatible with the Centralized Training, Decentralized Execution (CTDE) paradigm. Through these methods, the paper aims to explore how to improve the performance of cooperative tasks by introducing action correlations between agents while maintaining the scalability of MARL algorithms.

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning

Fully decentralized multi-agent reinforcement learning with networked agents

A Cooperation Graph Approach for Multiagent Sparse Reward Reinforcement Learning

Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

Multi-Agent Reinforcement Learning With Decentralized Distribution Correction

Bi-Level Actor-Critic for Multi-Agent Coordination.

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning

Multi-Agent Actor-Critic with Hierarchical Graph Attention Network

Decentralized multi-agent reinforcement learning based on best-response policies

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Distributed Actor–Critic Algorithms for Multiagent Reinforcement Learning over Directed Graphs

Multi-agent Continual Coordination Via Progressive Task Contextualization

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

Policy Consensus-Based Distributed Deterministic Multi-Agent Reinforcement Learning over Directed Graphs

Multiagent Continual Coordination via Progressive Task Contextualization

Multi-Agent Reinforcement Learning in Stochastic Networked Systems