A Q-based Policy Gradient Optimization Approach for Doudizhu

Yu Xiaomin,Wang Yisong,Qin Jin,Chen Panfeng
DOI: https://doi.org/10.1007/s10489-022-04281-x
IF: 5.3
2022-01-01
Applied Intelligence
Abstract:Deep reinforcement learning (DRL) has recently been employed in various games, with which superhuman intelligence has been achieved, including Atari, Go, no-limit, and Texas hold’em. However, this technique has not been fully considered for Doudizhu which is a popular poker game in Asia and involves confrontation and cooperation among multiple players with imperfect information. In this paper we present a new deep reinforcement learning approach NV-Dou for the game Doudizhu. It adopts a variant of neural fictitious self-play to approximate the Nash equilibria of the game. The loss functions of the neural network integrate Q-Based policy gradient (mean actor critic) with advantage learning and proximal policy optimization. In addition, parametric noises are adopted for the fully connected layers in the neural network. The experimental results show that it needs only a few hours of training and achieves almost state-of-the-art performance comparing with the well-known open implementations RHCP, CQL, MCTS and others for Doudizhu.
What problem does this paper attempt to address?