Abstract:One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).

Expected Lenient Q-learning: a fast variant of the Lenient Q-learning algorithm for cooperative stochastic Markov games

Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games

Multi-Agent Alternate Q-Learning.

MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning

Lenient Multi-Agent Deep Reinforcement Learning

LOQA: Learning with Opponent Q-Learning Awareness

Best Possible Q-Learning

Adaptive Individual Q-Learning-A Multiagent Reinforcement Learning Method for Coordination Optimization

I2Q: A Fully Decentralized Q-Learning Algorithm

Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

$QD$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

A novel multi-agent Q-learning algorithm in cooperative multi-agent system

FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game

Multiagent Reinforcement Learning with Unshared Value Functions.

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

A finite time analysis of distributed Q-learning

Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion