Abstract:One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).

A Multitier Reinforcement Learning Model for a Cooperative Multiagent System

Learning Intra-group Cooperation in Multi-agent Systems.

Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning

Developing cooperative policies for multi-stage reinforcement learning tasks

Modeling the Interaction Between Agents in Cooperative Multi-Agent Reinforcement Learning

A Two-Layered Multi-Agent Reinforcement Learning Model and Algorithm

Reinforcement learning for encouraging cooperation in a multiagent system

Individual Reward Assisted Multi-Agent Reinforcement Learning.

Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning

Multiagent Reinforcement Learning with Unshared Value Functions.

Policy Diversity for Cooperative Agents

A Cooperative Multi-Agent Reinforcement Learning Method Based on Coordination Degree

LMRL: a Multi-Agent Reinforcement Learning Model and Algorithm

Multi-agent cooperation through learning-aware policy gradients

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

A Cooperative Multi-Agent Reinforcement Learning Algorithm Based on Dynamic Self-Selection Parameters Sharing

Cooperative and Competitive Biases for Multi-Agent Reinforcement Learning

Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles