Abstract:Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: How to apply Reinforcement Learning (RL) algorithms to train agents in a multi - person non - zero - sum game environment, especially in the "No - Press Diplomacy" game, so that they can effectively handle complex multi - player interactions, simultaneous actions, and huge combinatorial action spaces. Specifically, the authors hope: 1. **Coping with complex interpersonal interactions**: The "Diplomacy" game emphasizes the tension between competition and cooperation, which makes it particularly suitable for studying learning problems in mixed - motive settings. Traditional RL methods are usually applicable to two - person zero - sum games, but in multi - person games, the relationships between players are more complex, including both cooperation and competition. 2. **Handling the challenges of simultaneous actions**: In the game, all players make decisions at the same time without knowing the choices of other players, which poses a high demand for predicting opponents' strategies. 3. **Overcoming the huge combinatorial action space**: "Diplomacy" has an extremely large combinatorial action space. The estimated size of the game tree is \(10^{900}\), and the number of legal joint actions per turn is between \(10^{21}\) and \(10^{64}\). An action space of this scale poses a significant challenge to existing RL algorithms. To address these problems, the authors propose a new Sampled Best Response (SBR) operator and introduce a series of policy - iteration - based methods to approximate iterative best responses and Fictitious Play. Through these improvements, they successfully apply RL to "No - Press Diplomacy" and show that their agents are significantly superior to the previous state - of - the - art. In addition, through game - theoretic equilibrium analysis, they prove the consistent improvement of the new method. In summary, the core objective of this paper is to explore and develop effective deep reinforcement learning methods in a complex, multi - person strategic game to better understand and simulate human behavior patterns when facing multiple stakeholders.

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

No-Press Diplomacy from Scratch

No Press Diplomacy: Modeling Multi-Agent Gameplay

Human-Level Performance in No-Press Diplomacy via Equilibrium Search

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Learning to Play General-Sum Games against Multiple Boundedly Rational Agents

Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games

BRExIt: On Opponent Modelling in Expert Iteration

Achieving Correlated Equilibrium by Studying Opponent's Behavior Through Policy-Based Deep Reinforcement Learning

Negotiation and honesty in artificial intelligence methods for the board game of Diplomacy

Best Response Shaping

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

Deep Reinforcement Learning for Nash Equilibrium of Differential Games

Using Graph-Aware Reinforcement Learning to Identify Winning Strategies in Diplomacy Games (Student Abstract)

Efficacy of Language Model Self-Play in Non-Zero-Sum Games

Robust Reinforcement Learning through Efficient Adversarial Herding

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

Evaluation and Learning in Two-Player Symmetric Games via Best and Better Responses

Stackelberg Batch Policy Learning