Abstract:One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents' value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).

Sample-Efficient Multi-Agent RL: an Optimization Perspective.

Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium

Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Efficient Multi-agent Reinforcement Learning by Planning

Multiagent Reinforcement Learning with Unshared Value Functions.

Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

Provable Memory Efficient Self-Play Algorithm for Model-free Reinforcement Learning

Towards Efficient Multi-Agent Learning Systems

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Equilibrium Selection for Multi-agent Reinforcement Learning: A Unified Framework

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning