Abstract:In two-player zero-sum games, if both players minimize their average external regret, then the average of the strategy profiles converges to a Nash equilibrium. For n-player general-sum games, however, theoretical guarantees for regret minimization are less understood. Nonetheless, Counterfactual Regret Minimization (CFR), a popular regret minimization algorithm for extensive-form games, has generated winning three-player Texas Hold'em agents in the Annual Computer Poker Competition (ACPC). In this paper, we provide the first set of theoretical properties for regret minimization algorithms in non-zero-sum games by proving that solutions eliminate iterative strict domination. We formally define \emph{dominated actions} in extensive-form games, show that CFR avoids iteratively strictly dominated actions and strategies, and demonstrate that removing iteratively dominated actions is enough to win a mock tournament in a small poker game. In addition, for two-player non-zero-sum games, we bound the worst case performance and show that in practice, regret minimization can yield strategies very close to equilibrium. Our theoretical advancements lead us to a new modification of CFR for games with more than two players that is more efficient and may be used to generate stronger strategies than previously possible. Furthermore, we present a new three-player Texas Hold'em poker agent that was built using CFR and a novel game decomposition method. Our new agent wins the three-player events of the 2012 ACPC and defeats the winning three-player programs from previous competitions while requiring less resources to generate than the 2011 winner. Finally, we show that our CFR modification computes a strategy of equal quality to our new agent in a quarter of the time of standard CFR using half the memory.

Deep Counterfactual Regret Minimization

Hierarchical Deep Counterfactual Regret Minimization

D2CFR: Minimize Counterfactual Regret With Deep Dueling Neural Network

RLCFR: Minimize counterfactual regret by deep reinforcement learning

Double Neural Counterfactual Regret Minimization.

Solving Imperfect-Information Games Via Exponential Counterfactual Regret Minimization

No-Regret Learning in Extensive-Form Games with Imperfect Recall

GPU-Accelerated Counterfactual Regret Minimization

Kdb-D2CFR: Solving Multiplayer imperfect-information games with knowledge distillation-based DeepCFR

Efficient CFR for Imperfect Information Games with Instant Updates

Imization for extensive games with imperfect information

Lazy-CFR: a Fast Regret Minimization Algorithm for Extensive Games with Imperfect Information.

Model-Free Neural Counterfactual Regret Minimization with Bootstrap Learning

CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Unknown Environments

Modeling Other Players with Bayesian Beliefs for Games with Incomplete Information

RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

Regret Minimization in Non-Zero-Sum Games with Applications to Building Champion Multiplayer Computer Poker Agents

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent