Abstract:One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).

Correcting Biased Value Estimation in Mixing Value-Based Multi-Agent Reinforcement Learning by Multiple Choice Learning.

S2rl

Softmax with Regularization: Better Value Estimation in Multi-Agent Reinforcement Learning.

Regularized Softmax Deep Multi-Agent Q-Learning.

QPLEX: Duplex Dueling Multi-Agent Q-Learning.

MCMARL: Parameterizing Value Function Via Mixture of Categorical Distributions for Multi-Agent Reinforcement Learning

Multiagent Reinforcement Learning with Unshared Value Functions.

Learning Multi-Agent Cooperation via Considering Actions of Teammates

Multi-Agent Q-Value Mixing Network with Covariance Matrix Adaptation Strategy for the Voltage Regulation Problem

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

DQMIX: A Distributional Perspective on Multi-Agent Reinforcement Learning

An Overestimation Reduction Method Based on the Multi-step Weighted Double Estimation Using Value-Decomposition Multi-agent Reinforcement Learning

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of Multi-Agent Reinforcement Learning

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning

Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

MIXRTs: Toward Interpretable Multi-Agent Reinforcement Learning Via Mixing Recurrent Soft Decision Trees

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

MAR2MIX: A Novel Model for Dynamic Problem in Multi-agent Reinforcement Learning.