Abstract:One of the most popular methods for learning Nash equilibrium (NE) in large-scale imperfect information extensive-form games (IIEFGs) is the neural variants of counterfactual regret minimization (CFR) . CFR is a special case of Follow-The-Regularized-Leader (FTRL) . At each iteration, the neural variants of CFR update the agent's strategy via the estimated counterfactual regrets. Then, they use neural networks to approximate the new strategy, which incurs an approximation error. These approximation errors will accumulate since the counterfactual regrets at iteration t are estimated using the agent's past approximated strategies. Such accumulated approximation error causes poor performance. To address this accumulated approximation error, we propose a novel FTRL algorithm called FTRL-ORW , which does not utilize the agent's past strategies to pick the next iteration strategy. More importantly, FTRL-ORW can update its strategy via the trajectories sampled from the game, which is suitable to solve large-scale IIEFGs since sampling multiple actions for each information set is too expensive in such games. However, it remains unclear which algorithm to use to compute the next iteration strategy for FTRL-ORW when only such sampled trajectories are revealed at iteration t . To address this problem and scale FTRL-ORW to large-scale games, we provide a model-free method called Deep FTRL-ORW , which computes the next iteration strategy using model-free Maximum Entropy Deep Reinforcement Learning . Experimental results on two-player zero-sum IIEFGs show that Deep FTRL-ORW significantly outperforms existing model-free neural methods and OS-MCCFR .

Model-Free Neural Counterfactual Regret Minimization with Bootstrap Learning

Double Neural Counterfactual Regret Minimization.

D2CFR: Minimize Counterfactual Regret With Deep Dueling Neural Network

Deep Counterfactual Regret Minimization

RLCFR: Minimize counterfactual regret by deep reinforcement learning

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

GPU-Accelerated Counterfactual Regret Minimization

Lazy-CFR: a Fast Regret Minimization Algorithm for Extensive Games with Imperfect Information.

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

Modeling Other Players with Bayesian Beliefs for Games with Incomplete Information

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Unknown Environments

No-Regret Learning in Extensive-Form Games with Imperfect Recall

Parallel Counterfactual Regret Minimization in Crowdsourcing Imperfect-information Expanded Game

Imization for extensive games with imperfect information

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

Efficient CFR for Imperfect Information Games with Instant Updates

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Revisiting Counterfactual Regression through the Lens of Gromov-Wasserstein Information Bottleneck

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play