Abstract:Effective action abstraction is crucial in tackling challenges associated with large action spaces in Imperfect Information Extensive-Form Games (IIEFGs). However, due to the vast state space and computational complexity in IIEFGs, existing methods often rely on fixed abstractions, resulting in sub-optimal performance. In response, we introduce RL-CFR, a novel reinforcement learning (RL) approach for dynamic action abstraction. RL-CFR builds upon our innovative Markov Decision Process (MDP) formulation, with states corresponding to public information and actions represented as feature vectors indicating specific action abstractions. The reward is defined as the expected payoff difference between the selected and default action abstractions. RL-CFR constructs a game tree with RL-guided action abstractions and utilizes counterfactual regret minimization (CFR) for strategy derivation. Impressively, it can be trained from scratch, achieving higher expected payoff without increased CFR solving time. In experiments on Heads-up No-limit Texas Hold'em, RL-CFR outperforms ReBeL's replication and Slumbot, demonstrating significant win-rate margins of $64\pm 11$ and $84\pm 17$ mbb/hand, respectively.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively perform action abstraction to deal with the challenges brought by large - scale action spaces in Imperfect Information Extensive - Form Games (IIEFGs). Specifically, existing methods usually rely on fixed action abstractions, which lead to sub - optimal performance. The paper proposes a new reinforcement - learning - based method, RL - CFR, for dynamically selecting action abstractions, aiming to improve performance in IIEFGs while maintaining low computational complexity. ### Main contributions of the paper: 1. **Innovative MDP Modeling**: The paper proposes a Markov Decision Process (MDP) model specifically designed for IIEFGs. In this model, the state represents public information, the action represents a specific action abstraction, and the reward is defined as the expected payoff difference between the selected action abstraction and the default fixed action abstraction. This ability to dynamically adjust action abstractions improves the adaptability of the model. 2. **RL - CFR Framework**: Based on the above MDP model, the paper introduces the RL - CFR framework, a new method that combines Deep Reinforcement Learning (DRL) with Counterfactual Regret Minimization (CFR). This framework can improve the expected payoff by dynamically selecting better action abstractions without increasing the CFR solution time. In addition, RL - CFR can be trained from scratch, only requiring the rules of IIEFG. 3. **Evaluation on HUNL Poker Game**: The paper evaluates the performance of RL - CFR in the complex two - player Heads - up No - limit Texas Hold’em (HUNL) poker game. The experimental results show that RL - CFR significantly outperforms the replicated version of the HUNL agent ReBeL based on fixed action abstractions and the well - known HUNL agent Slumbot, achieving a winning rate advantage of 64 ± 11 and 84 ± 17 mbb/hand respectively in tests of more than 600,000 hands and 250,000 hands. ### Key problems solved: - **Large - scale action space**: IIEFGs usually have a large number of action options, resulting in an exponential growth in the size of the game tree and extremely high computational complexity. By dynamically selecting action abstractions, RL - CFR can effectively reduce the size of the game tree and improve computational efficiency. - **Limitations of fixed action abstractions**: Most existing methods rely on fixed action abstractions, which limit the flexibility and optimization degree of the strategy. RL - CFR overcomes this limitation by dynamically selecting action abstractions and achieves higher performance. - **Handling of mixed strategies**: The optimal strategy in IIEFGs is usually a mixed strategy, while traditional reinforcement learning algorithms are mainly applicable to the learning of deterministic strategies. RL - CFR successfully solves this problem by combining DRL and CFR and can perform excellently when dealing with mixed strategies. In conclusion, by proposing the RL - CFR framework, this paper provides an effective solution to the problems of large - scale action space and computational complexity in IIEFGs, showing significant advantages in practical applications.

RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

RLCFR: Minimize counterfactual regret by deep reinforcement learning

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Unknown Environments

Deep Counterfactual Regret Minimization

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Curriculum-RL Based Air Combat Decision-Making

Expanding the Resolution Boundary of Outcome-Based Imperfect-Recall Abstraction in Games with Ordered Signals

Scalable sub-game solving for imperfect-information games

Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints

Hierarchical Deep Counterfactual Regret Minimization

Efficient CFR for Imperfect Information Games with Instant Updates

No-Regret Learning in Extensive-Form Games with Imperfect Recall

Extensive Game Decision Based on the PPO-CFR Algorithm under Incomplete Information

Evolutionary Reinforcement Learning with Action Sequence Search for Imperfect Information Games

Modeling Other Players with Bayesian Beliefs for Games with Incomplete Information

HORSE-CFR: Hierarchical Opponent Reasoning for Safe Exploitation Counterfactual Regret Minimization

Posterior Sampling for Multi-Agent Reinforcement Learning: Solving Extensive Games with Imperfect Information

Double Neural Counterfactual Regret Minimization.