Abstract:Multi-Agent Reinforcement Learning (MARL) is widely used to solve various real-world prob-lems. In MARL, the environment contains multiple agents. A good grasp of the environment can guide agents to learn cooperative strategies. In Centralized Training Decentralized Execution (CTDE), a centralized critic is used to guide cooperative strategies learning. However, having mul-tiple agents in the environment leads to the curse of dimensionality and influence of other agents' strategies, resulting in difficulties for centralized critics to learn good cooperative strategies. We propose a graph-based approach to overcome the above problems. It uses a graph neural network, which uses partial observations of agents as input, and information between agents is aggregated by graph methods to extract information about the whole environment. In this way, agents can improve their understanding of the overall state of the environment and other agents in the environ-ment while avoiding dimensional explosion. Then we combine a dual critic dynamic decomposition method with soft actor-critic to train policy. The former uses individual and global rewards for learning, avoiding the influence of other agents' strategies, and the latter help to learn an optional policy better. We call this approach Multi-Agent Graph-based soft Actor-Critic (MAGAC). We compare our proposed method with several classical MARL algorithms under the Multi-agent Par-ticle Environment (MPE). The experimental results show that our method can achieve a faster learning speed while learning better policy.

FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning

Expert demonstrations guide reward decomposition for multi-agent cooperation

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization.

ConcaveQ: Non-Monotonic Value Function Factorization Via Concave Representations in Deep Multi-Agent Reinforcement Learning

Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning

Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning

Priority over Quantity: A Self-Incentive Credit Assignment Scheme for Cooperative Multiagent Reinforcement Learning

QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning

TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning

Towards Global Optimality in Cooperative MARL with Sequential Transformation

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

TVDO: Tchebycheff Value-Decomposition Optimization for Multi-Agent Reinforcement Learning

Projection-Optimal Monotonic Value Function Factorization in Multi-Agent Reinforcement Learning.

A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning

ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization

DVF:Multi-agent Q-learning with difference value factorization

Special Agents Policy Gradient In Value Decomposition-based Approach

PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning