Abstract:Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Hierarchical Deep Reinforcement Learning Agent with Counter Self-play on Competitive Games

A Comparison of Self-Play Algorithms Under a Generalized Framework

Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game

GRAC: Self-Guided and Self-Regularized Actor-Critic

Efficient Competitive Self-Play Policy Optimization

Generalizing soft actor-critic algorithms to discrete action spaces

A Survey on Self-play Methods in Reinforcement Learning

An Advanced Actor-Critic Algorithm for Training Video Game AI

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems

Explorer-Actor-Critic: Better Actors for Deep Reinforcement Learning

Deep reinforcement learning algorithm based on multi-agent parallelism and its application in game environment

Efficient Multi-Agent Exploration with Mutual-Guided Actor-Critic

Self-play Decision-making Method of Deep Reinforcement Learning Guided by Behavior Tree under Complex Environment

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Applying Online Expert Supervision in Deep Actor-Critic Reinforcement Learning.

Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games