RARSMSDou: Master the Game of DouDiZhu With Deep Reinforcement Learning Algorithms

T. Tan,Qian Luo
DOI: https://doi.org/10.1109/TETCI.2023.3303251
2024-02-01
IEEE Transactions on Emerging Topics in Computational Intelligence
Abstract:Artificial Intelligence (AI) has seen several breakthroughs in some perfect- and imperfect-information games, such as Go, Texas Hold'em, and StarCraft II. However, the Chinese poker game, DouDiZhu presents new challenges for AI systems to overcome, including infering imperfect information, training with sparse rewards, and handling a large state-action space. This article describes our proposed DouDiZhu AI system, RARSMSDou, based on Deep Reinforcement Learning (DRL) algorithms that combines Proximal Policy Optimization (PPO), Relative Advantage Reward Shaping with Minimum Splits (RARSMS), and Deep Monte-Carlo (DMC) into a self-play framework. In RARSMSDou, we propose RARSMS as a novel intrinsic reward to guide the training for PPO in a sparse reward environment. We treat the imperfect information as observable information and feed it into the critic-network of PPO, and we propose abstract actions to simplify the large-action space (27,472 actions) to a low-dimensional action space (309 actions contain 189 specific actions and 120 abstract actions) which is output by the policy network of PPO. When the policy is an abstract action, DMC (DouZeroX) maps this abstract action to its specific action as a policy for training or execution. We compare the performance of RARSMSDou with its four variants (PPO, PPO+RARSMS, PPO+DMC, DMC (DouZeroX)) and five state-of-the-art DouDiZhu AI programs. The experiment results show that after 30 days of self-play and training, RARSMSDou outperforms its variants and DouZero (with a WP of 0.582 and an ADP of 0.414), which is the best DouDiZhu baseline.
Computer Science
What problem does this paper attempt to address?