JP-DouZero: an enhanced DouDiZhu AI based on reinforcement learning with peasant collaboration and intrinsic rewards.

Mu Yuan,Nikolaos M. Freris
DOI: https://doi.org/10.1109/BIGCOM61073.2023.00042
2023-01-01
Abstract:DouDiZhu is a popular Chinese three-player poker game where two peasants collaborate against a landlord. The high complexity of the problem for a reinforcement learning-based AI is attributed to three factors: imperfect information, coexistence of competition and cooperation, as well as huge state and action spaces. The current state-of-the-art system is called DouZero, which combines Monte Carlo methods with deep neural networks and makes use of self-play without human expertise. This paper proposes JP-DouZero which addresses two shortcomings of existing methods, namely: a) the cooperation between the two peasants is not explicitly modeled and b) ’sparse reward’ i.e., state-action trajectories receive a binary score based on whether they lead to win or loss at the end of the game. For the former, we design a joint peasant Q-network to determine the reward of every state-action pair from the standpoint of the peasant coalition. For the latter, we devise a new reward mechanism comprising of three parts, namely curiosity-driven reward, result-driven reward, and extrinsic reward. Extensive experiments corroborate a significant increase of the peasant advantage in terms of a 2.2% higher winning rate and over 0.12 higher difference of scored points compared with DouZero baseline. An ablation study is carried out to show the impact of the design choices on the overall performance improvement.
What problem does this paper attempt to address?