Abstract:A great deal of multi agent reinforcement learning(MARL) work has investigated how multiple agents effectively accom plish cooperative tasks utilizing value function decom position methods. However, ex-isting value decomposition methods can only handle cooperative tasks with shared reward, due to these methods factorize the value function from a global perspective. To tackle the competitive tasks and mixed cooperative-competitive tasks with differing individual reward setting, we design the Multi-agent Dueling Q-learning (MDQ) method based on mean-filed theory and individual value decomposition. Specifically, we integrate the mean-field theory with the value decomposition to factorize the value function at the individual level, which can deal with mixed cooperative-competitive tasks. Besides, we take a dueling network architecture to distinguish which states are valuable, eliminating the need to learn the impact of each action on each state, therefore enabling efficient learning and leading to better policy evaluation. The proposed method MDQ is applicable not only to cooperative tasks with shared rewards setting, but also to mixed cooperative-competitive tasks with individualized reward settings. In this end, it is flexible and generically applicable enough to most multi-agent tasks. Empirical experiments on various mixed cooperative-competitive tasks demonstrate that MDQ significantly outperforms existing multi agent rein-forcement learning methods.(c) 2023 Elsevier Ltd. All rights reserved.

Deep Factorized Q-Learning for Large Scale Multi-Agent Learning

Attentional Factorized Q-Learning for Many-Agent Learning.

Factorized Q-learning for Large-Scale Multi-Agent Systems

Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization.

Multi-agent Dueling Q-learning with Mean Field and Value Decomposition

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning

Value function factorization with dynamic weighting for deep multi-agent reinforcement learning

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization

QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning

Multi-agent Q-learning with Joint State Value Approximation

Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning

DVF:Multi-agent Q-learning with difference value factorization

ConcaveQ: Non-Monotonic Value Function Factorization Via Concave Representations in Deep Multi-Agent Reinforcement Learning

Attention Based Large Scale Multi-agent Reinforcement Learning

Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning

Qrelation: an Agent Relation-Based Approach for Multi-Agent Reinforcement Learning Value Function Factorization.

Learning Nearly Decomposable Value Functions Via Communication Minimization

QPLEX: Duplex Dueling Multi-Agent Q-Learning.

The challenge of redundancy on multi-agent value factorisation