Abstract:One of the most popular methods for learning Nash equilibrium (NE) in large-scale imperfect information extensive-form games (IIEFGs) is the neural variants of counterfactual regret minimization (CFR) . CFR is a special case of Follow-The-Regularized-Leader (FTRL) . At each iteration, the neural variants of CFR update the agent's strategy via the estimated counterfactual regrets. Then, they use neural networks to approximate the new strategy, which incurs an approximation error. These approximation errors will accumulate since the counterfactual regrets at iteration t are estimated using the agent's past approximated strategies. Such accumulated approximation error causes poor performance. To address this accumulated approximation error, we propose a novel FTRL algorithm called FTRL-ORW , which does not utilize the agent's past strategies to pick the next iteration strategy. More importantly, FTRL-ORW can update its strategy via the trajectories sampled from the game, which is suitable to solve large-scale IIEFGs since sampling multiple actions for each information set is too expensive in such games. However, it remains unclear which algorithm to use to compute the next iteration strategy for FTRL-ORW when only such sampled trajectories are revealed at iteration t . To address this problem and scale FTRL-ORW to large-scale games, we provide a model-free method called Deep FTRL-ORW , which computes the next iteration strategy using model-free Maximum Entropy Deep Reinforcement Learning . Experimental results on two-player zero-sum IIEFGs show that Deep FTRL-ORW significantly outperforms existing model-free neural methods and OS-MCCFR .

RLCFR: Minimize counterfactual regret by deep reinforcement learning

D2CFR: Minimize Counterfactual Regret With Deep Dueling Neural Network

Deep Counterfactual Regret Minimization

RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning

Double Neural Counterfactual Regret Minimization.

Lazy-CFR: a Fast Regret Minimization Algorithm for Extensive Games with Imperfect Information.

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

No-Regret Learning in Extensive-Form Games with Imperfect Recall

Imization for extensive games with imperfect information

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Unknown Environments

Model-Free Neural Counterfactual Regret Minimization with Bootstrap Learning

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Efficient CFR for Imperfect Information Games with Instant Updates

CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong

GPU-Accelerated Counterfactual Regret Minimization

A Survey of Nash Equilibrium Strategy Solving Based on CFR

Modeling Other Players with Bayesian Beliefs for Games with Incomplete Information

Regret Minimization for Partially Observable Deep Reinforcement Learning

Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games

Kdb-D2CFR: Solving Multiplayer imperfect-information games with knowledge distillation-based DeepCFR