Abstract:Artificial Intelligence (AI) has seen several breakthroughs in some perfect- and imperfect-information games, such as Go, Texas Hold'em, and StarCraft II. However, the Chinese poker game, DouDiZhu presents new challenges for AI systems to overcome, including infering imperfect information, training with sparse rewards, and handling a large state-action space. This article describes our proposed DouDiZhu AI system, RARSMSDou, based on Deep Reinforcement Learning (DRL) algorithms that combines Proximal Policy Optimization (PPO), Relative Advantage Reward Shaping with Minimum Splits (RARSMS), and Deep Monte-Carlo (DMC) into a self-play framework. In RARSMSDou, we propose RARSMS as a novel intrinsic reward to guide the training for PPO in a sparse reward environment. We treat the imperfect information as observable information and feed it into the critic-network of PPO, and we propose abstract actions to simplify the large-action space (27,472 actions) to a low-dimensional action space (309 actions contain 189 specific actions and 120 abstract actions) which is output by the policy network of PPO. When the policy is an abstract action, DMC (DouZeroX) maps this abstract action to its specific action as a policy for training or execution. We compare the performance of RARSMSDou with its four variants (PPO, PPO+RARSMS, PPO+DMC, DMC (DouZeroX)) and five state-of-the-art DouDiZhu AI programs. The experiment results show that after 30 days of self-play and training, RARSMSDou outperforms its variants and DouZero (with a WP of 0.582 and an ADP of 0.414), which is the best DouDiZhu baseline.

Mastering "Gongzhu" with Self-play Deep Reinforcement Learning.

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Mastering the Game of Guandan with Deep Reinforcement Learning and Behavior Regulating

DanZero+: Dominating the GuanDan Game through Reinforcement Learning

DanZero: Mastering GuanDan Game with Reinforcement Learning

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Full DouZero+: Improving DouDizhu AI by Opponent Modeling, Coach-Guided Training and Bidding Learning

DouZero+: Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

DouRN: Improving DouZero by Residual Neural Networks

RARSMSDou: Master the Game of DouDiZhu With Deep Reinforcement Learning Algorithms

Suphx: Mastering Mahjong with Deep Reinforcement Learning

ScrofaZero: Mastering Trick-taking Poker Game Gongzhu by Deep Reinforcement Learning

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

A Deep Reinforcement Learning-Based Approach in Porker Game

Mastering Strategy Card Game (Hearthstone) with Improved Techniques

Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End Policy and Optimistic Smooth Fictitious Play

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Deep reinforcement learning algorithm based on multi-agent parallelism and its application in game environment

Mastering Chinese Chess AI (Xiangqi) Without Search

Teaching Deep Convolutional Neural Networks to Play Go