Abstract:Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.

NFSP-PLT: Solving Games with a Weighted NFSP-PER-Based Method

NFSP-PER: an Efficient Sampling NFSP-based Method with Prioritized Experience Replay

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Finding nash equilibrium for imperfect information games via fictitious play based on local regret minimization

Reinforcement Nash Equilibrium Solver

Strategy Optimization of Imperfect Information Games Based on NFSP with DDQN

Solving Large-Scale Extensive-Form Network Security Games via Neural Fictitious Self-Play

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

Learn to Predict Equilibria via Fixed Point Networks

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Operator Splitting for Learning to Predict Equilibria in Convex Games

Integrating Dynamic Weighted Approach with Fictitious Play and Pure Counterfactual Regret Minimization for Equilibrium Finding

A Unified Perspective on Deep Equilibrium Finding

MultiNash-PF: A Particle Filtering Approach for Computing Multiple Local Generalized Nash Equilibria in Trajectory Games

A Survey of Nash Equilibrium Strategy Solving Based on CFR