Abstract:Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold 'em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy.

A Q-based Policy Gradient Optimization Approach for Doudizhu

RARSMSDou: Master the Game of DouDiZhu With Deep Reinforcement Learning Algorithms

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

DouRN: Improving DouZero by Residual Neural Networks

A Deep Reinforcement Learning-Based Approach in Porker Game

DouZero+: Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning

Full DouZero+: Improving DouDizhu AI by Opponent Modeling, Coach-Guided Training and Bidding Learning

PerfectDou: Dominating DouDizhu with Perfect Information Distillation

Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game

A Human Mixed Strategy Approach to Deep Reinforcement Learning

DanZero+: Dominating the GuanDan Game through Reinforcement Learning

A Survey of Deep Reinforcement Learning in Video Games

Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning

Efficient Opponent Exploitation in No-Limit Texas Hold’em Poker: A Neuroevolutionary Method Combined with Reinforcement Learning

Regularly Updated Deterministic Policy Gradient Algorithm

A Problem In Psychoanalytic Technique

Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations

Using Deep Q-Learning to Control Optimization Hyperparameters

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

Deep Q-learning From Demonstrations