Abstract:Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the problem of finding the Global Nash Equilibrium (NE) in mixed cooperative-competitive games. Specifically, the researchers focus on a multi-agent environment that includes two competing teams, where members within each team need to cooperate to compete against the other team. The ultimate goal is to bring the entire system to a state where no team can achieve higher payoffs by changing their strategies, i.e., the Global Nash Equilibrium. Traditional Self-Play (SP) methods, although performing well in certain game scenarios, are theoretically limited to two-player zero-sum games and may converge to suboptimal local Nash Equilibria in mixed cooperative-competitive games, failing to guarantee a globally optimal solution. On the other hand, the Policy-Space Response Oracles (PSRO) algorithm, while theoretically ensuring convergence to a Nash Equilibrium, requires training strategies from scratch, resulting in low application efficiency in complex games. To address the aforementioned issues, the paper proposes a new algorithm—Fictitious Cross-Play (FXP). FXP combines the advantages of SP and PSRO by simultaneously training a primary strategy based on self-play and a set of adversarial strategies based on best responses. This approach ensures theoretical convergence while improving learning efficiency in practical applications. The primary strategy is trained by playing against itself, past versions of itself, and the set of adversarial strategies, while the adversarial strategies are trained as best responses to the historical versions of the primary strategy. Experimental results show that FXP can quickly converge to the Global Nash Equilibrium in matrix games, grid world environments, and more complex soccer games, demonstrating higher performance compared to baseline methods.

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Offline Fictitious Self-Play for Competitive Games

Efficient Competitive Self-Play Policy Optimization

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Team-Fictitious Play for Reaching Team-Nash Equilibrium in Multi-team Games

Reinforcement Nash Equilibrium Solver

A Generalized Training Approach for Multiagent Learning

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles

Neural Auto-Curricula in Two-Player Zero-Sum Games.

Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End Policy and Optimistic Smooth Fictitious Play

Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions

Neural Auto-Curricula

Hierarchical Deep Reinforcement Learning Agent with Counter Self-play on Competitive Games

In-Context Exploiter for Extensive-Form Games

A Unified Perspective on Deep Equilibrium Finding

Empirical Policy Optimization for n-Player Markov Games

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking