Abstract: In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a team reward and observing the next state. During the interactions, the uncertainty of environment and reward will inevitably induce stochasticity in the long-term returns and the randomness can be exacerbated with the increasing number of agents. However, most of the existing value-based multi-agent reinforcement learning (MARL) methods only model the expectations of individual Q-values and global Q-value, ignoring such randomness. Compared to the expectations of the long-term returns, it is more preferable to directly model the stochasticity by estimating the returns through distributions. With this motivation, this work proposes DQMIX, a novel value-based MARL method, from a distributional perspective. Specifically, we model each individual Q-value with a categorical distribution. To integrate these individual Q-value distributions into the global Q-value distribution, we design a distribution mixing network, based on five basic operations on the distribution. We further prove that DQMIX satisfies the \emph{Distributional-Individual-Global-Max} (DIGM) principle with respect to the expectation of distribution, which guarantees the consistency between joint and individual greedy action selections in the global Q-value and individual Q-values. To validate DQMIX, we demonstrate its ability to factorize a matrix game with stochastic rewards. Furthermore, the experimental results on a challenging set of StarCraft II micromanagement tasks show that DQMIX consistently outperforms the value-based multi-agent reinforcement learning baselines.

Value function factorization with dynamic weighting for deep multi-agent reinforcement learning

POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

On Stateful Value Factorization in Multi-Agent Reinforcement Learning

DVF:Multi-agent Q-learning with difference value factorization

Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization.

Expert demonstrations guide reward decomposition for multi-agent cooperation

Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning

DQMIX: A Distributional Perspective on Multi-Agent Reinforcement Learning

Learning Nearly Decomposable Value Functions Via Communication Minimization

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

Learning Multi-Agent Cooperation via Considering Actions of Teammates

Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning

Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

QTypeMix: Enhancing Multi-Agent Cooperative Strategies through Heterogeneous and Homogeneous Value Decomposition

Factorized Q-learning for Large-Scale Multi-Agent Systems

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

Regularized Softmax Deep Multi-Agent Q-Learning.

An Overestimation Reduction Method Based on the Multi-step Weighted Double Estimation Using Value-Decomposition Multi-agent Reinforcement Learning