Mastering "Gongzhu" with Self-play Deep Reinforcement Learning.

Licheng Wu,Qifei Wu,Hongming Zhong,Xiali Li
DOI: https://doi.org/10.1007/978-981-99-0617-8_11
2022-01-01
Abstract:“Gongzhu” is a card game popular in Chinese circles at home and abroad, which belongs to incomplete information game. The game process is highly reversible and has complex state space and action space. This paper proposes an algorithm that combines the Monte-Carlo (MC) method with deep neural networks, called the Deep Monte-Carlo (DMC) algorithm. Different from the traditional MC algorithm, this algorithm uses a Deep Q-Network (DQN) instead of the Q-table to update the Q-value and uses a distributed parallel training framework to build the model, which can effectively solve the problems of computational complexity and limited resources. After 24 h of training on a server with 1 GPU, the “Gongzhu” agent performed 10,000 games against the agent that uses a Convolutional Neural Network (CNN) to fit the strategies of human players. “Gongzhu” agent was able to achieve a 72.6% winning rate, and the average points per game was 63. The experimental results show that the model has better performance.
What problem does this paper attempt to address?