Solving Imperfect Information Poker Games Using Monte Carlo Search and POMDP Models

Jian Yao,Zeyu Zhang,Li Xia,Jun Yang,Qianchuan Zhao
DOI: https://doi.org/10.1109/ddcls49620.2020.9275053
2020-01-01
Abstract:Recent advances achieved in the field of reinforcement learning have led AI algorithms capable of beating world champions in some perfect information games like Chess and Go. However, the AI approach to imperfect information games (such as Poker) is much more difficult because the complexities in estimating hidden information and behaviors of opponents may become extremely challenging. Since Markov Decision Process (MDP) is the underlying mathematical model of reinforcement learning with perfect information games, Partially Observable Markov Decision Process (POMDP) deserves research attention for studying the games with imperfect information. In this paper, we study a 16-cards Rhode Island Hold'em poker game and present a POMDP model to formulate this imperfect information extensive game. Based on the POMDP model, we use Bayesian approach to estimate the opponent's hand and transform the original problem to several perfect information games. Furthermore, to handle the challenge of explosively huge storage space and computation burdens, we develop a Monte Carlo optimization algorithm to estimate the action values of the POMDP model. Finally, we conduct numerical experiments in the Rhode Island Hold'em poker game to demonstrate the effectiveness of our approach.
What problem does this paper attempt to address?