Abstract:Extensive-form games provide a versatile framework for modeling interactions of multiple agents subjected to imperfect observations and stochastic events. In recent years, two paradigms, policy space response oracles (PSRO) and counterfactual regret minimization (CFR), showed that extensive-form games may indeed be solved efficiently. Both of them are capable of leveraging deep neural networks to tackle the scalability issues inherent to extensive-form games and we refer to them as deep equilibrium-finding algorithms. Even though PSRO and CFR share some similarities, they are often regarded as distinct and the answer to the question of which is superior to the other remains ambiguous. Instead of answering this question directly, in this work we propose a unified perspective on deep equilibrium finding that generalizes both PSRO and CFR. Our four main contributions include: i) a novel response oracle (RO) which computes Q values as well as reaching probability values and baseline values; ii) two transform modules -- a pre-transform and a post-transform -- represented by neural networks transforming the outputs of RO to a latent additive space (LAS), and then the LAS to action probabilities for execution; iii) two average oracles -- local average oracle (LAO) and global average oracle (GAO) -- where LAO operates on LAS and GAO is used for evaluation only; and iv) a novel method inspired by fictitious play that optimizes the transform modules and average oracles, and automatically selects the optimal combination of components of the two frameworks. Experiments on Leduc poker game demonstrate that our approach can outperform both frameworks.

Finding nash equilibrium for imperfect information games via fictitious play based on local regret minimization

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Unknown Environments

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Integrating Dynamic Weighted Approach with Fictitious Play and Pure Counterfactual Regret Minimization for Equilibrium Finding

Score-Based Equilibrium Learning in Multi-Player Finite Games with Imperfect Information

Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

Empirical Analysis of Fictitious Play for Nash Equilibrium Computation in Multiplayer Games

Improving Fictitious Play Reinforcement Learning with Expanding Models

Anticipatory Fictitious Play

Double Neural Counterfactual Regret Minimization.

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

Deep Fictitious Play for Stochastic Differential Games

Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game

A Unified Perspective on Deep Equilibrium Finding