Abstract:To compute the optimal strategy in competitive games, algorithms have been developed to achieve the Nash equilibrium. Current deep learning algorithms have succeeded in many games; however, optimizing the algorithms to approach the Nash equilibrium in imperfect-information games like StarCraft and Poker remains challenging. Neural Fictitious Self-Play (NFSP) is an effective end-to-end algorithm to learn an approximate Nash equilibrium in imperfect-information games. However, because a player in NFSP trains its best response according to its opponents’ past strategies, a discrepancy exists between the optimal strategy and the learned best response after the player updates its strategies. We call this discrepancy the optimality gap . During training, the optimality gap does not decay monotonically, which causes suboptimal results or unstable convergence of NFSP. We improve the performance of NFSP by allowing the optimality gap to decay monotonically. In this study, we propose Regret Minimization Fictitious Self-Play (RM-FSP), which applies a regret minimization method to compute NFSP’s best response. The regret minimization method effectively converges the optimality gap monotonically and faster than in NFSP. We prove there will be a better learning bound than the original NFSP after applying regret minimization methods to NFSP. Experiments on three typical environments in OpenSpiel demonstrate that RM-FSP outperforms NFSP in both exploitability (discrepancy between the learned policy profile and the Nash equilibrium) and time efficiency.

Strategy Optimization of Imperfect Information Games Based on NFSP with DDQN

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Finding nash equilibrium for imperfect information games via fictitious play based on local regret minimization

Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

Imperfect Information Game in Multiplayer No-limit Texas Hold’em Based on Mean Approximation and Deep CFVnet

DouRN: Improving DouZero by Residual Neural Networks

DecisionHoldem: Safe Depth-Limited Solving With Diverse Opponents for Imperfect-Information Games

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Double Deep Q-Learning for Optimal Execution

PerfectDou: Dominating DouDizhu with Perfect Information Distillation

Scalable sub-game solving for imperfect-information games

R-DDQN: Optimizing Algorithmic Trading Strategies Using a Reward Network in a Double DQN

Full DouZero+: Improving DouDizhu AI by Opponent Modeling, Coach-Guided Training and Bidding Learning

Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search

FDQN: A Flexible Deep Q-Network Framework for Game Automation