Abstract:Artificial Intelligence (AI) has seen several breakthroughs in some perfect- and imperfect-information games, such as Go, Texas Hold'em, and StarCraft II. However, the Chinese poker game, DouDiZhu presents new challenges for AI systems to overcome, including infering imperfect information, training with sparse rewards, and handling a large state-action space. This article describes our proposed DouDiZhu AI system, RARSMSDou, based on Deep Reinforcement Learning (DRL) algorithms that combines Proximal Policy Optimization (PPO), Relative Advantage Reward Shaping with Minimum Splits (RARSMS), and Deep Monte-Carlo (DMC) into a self-play framework. In RARSMSDou, we propose RARSMS as a novel intrinsic reward to guide the training for PPO in a sparse reward environment. We treat the imperfect information as observable information and feed it into the critic-network of PPO, and we propose abstract actions to simplify the large-action space (27,472 actions) to a low-dimensional action space (309 actions contain 189 specific actions and 120 abstract actions) which is output by the policy network of PPO. When the policy is an abstract action, DMC (DouZeroX) maps this abstract action to its specific action as a policy for training or execution. We compare the performance of RARSMSDou with its four variants (PPO, PPO+RARSMS, PPO+DMC, DMC (DouZeroX)) and five state-of-the-art DouDiZhu AI programs. The experiment results show that after 30 days of self-play and training, RARSMSDou outperforms its variants and DouZero (with a WP of 0.582 and an ADP of 0.414), which is the best DouDiZhu baseline.

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Human-Level Performance in No-Press Diplomacy via Equilibrium Search

No Press Diplomacy: Modeling Multi-Agent Gameplay

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

No-Press Diplomacy from Scratch

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Mastering the game of Stratego with model-free multiagent reinforcement learning

Negotiation and honesty in artificial intelligence methods for the board game of Diplomacy

Human-AI Coordination via Human-Regularized Search and Learning

Using Graph-Aware Reinforcement Learning to Identify Winning Strategies in Diplomacy Games (Student Abstract)

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

Towards automating Codenames spymasters with deep reinforcement learning

RARSMSDou: Master the Game of DouDiZhu With Deep Reinforcement Learning Algorithms

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Efficacy of Language Model Self-Play in Non-Zero-Sum Games

Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates

More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play

Hierarchical Deep Reinforcement Learning Agent with Counter Self-play on Competitive Games