Abstract:One of the most popular methods for learning Nash equilibrium (NE) in large-scale imperfect information extensive-form games (IIEFGs) is the neural variants of counterfactual regret minimization (CFR) . CFR is a special case of Follow-The-Regularized-Leader (FTRL) . At each iteration, the neural variants of CFR update the agent's strategy via the estimated counterfactual regrets. Then, they use neural networks to approximate the new strategy, which incurs an approximation error. These approximation errors will accumulate since the counterfactual regrets at iteration t are estimated using the agent's past approximated strategies. Such accumulated approximation error causes poor performance. To address this accumulated approximation error, we propose a novel FTRL algorithm called FTRL-ORW , which does not utilize the agent's past strategies to pick the next iteration strategy. More importantly, FTRL-ORW can update its strategy via the trajectories sampled from the game, which is suitable to solve large-scale IIEFGs since sampling multiple actions for each information set is too expensive in such games. However, it remains unclear which algorithm to use to compute the next iteration strategy for FTRL-ORW when only such sampled trajectories are revealed at iteration t . To address this problem and scale FTRL-ORW to large-scale games, we provide a model-free method called Deep FTRL-ORW , which computes the next iteration strategy using model-free Maximum Entropy Deep Reinforcement Learning . Experimental results on two-player zero-sum IIEFGs show that Deep FTRL-ORW significantly outperforms existing model-free neural methods and OS-MCCFR .

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Unknown Environments

Double Neural Counterfactual Regret Minimization.

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

Lazy-CFR: a Fast Regret Minimization Algorithm for Extensive Games with Imperfect Information.

Imization for extensive games with imperfect information

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

RLCFR: Minimize counterfactual regret by deep reinforcement learning

RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning

Deep Counterfactual Regret Minimization

No-Regret Learning in Extensive-Form Games with Imperfect Recall

D2CFR: Minimize Counterfactual Regret With Deep Dueling Neural Network

Efficient CFR for Imperfect Information Games with Instant Updates

Model-Free Neural Counterfactual Regret Minimization with Bootstrap Learning

Finding nash equilibrium for imperfect information games via fictitious play based on local regret minimization

A Survey of Nash Equilibrium Strategy Solving Based on CFR

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Parallel Counterfactual Regret Minimization in Crowdsourcing Imperfect-information Expanded Game

Modeling Other Players with Bayesian Beliefs for Games with Incomplete Information

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Kdb-D2CFR: Solving Multiplayer imperfect-information games with knowledge distillation-based DeepCFR