Mastering Strategy Card Game (Hearthstone) with Improved Techniques

Changnan Xiao,Yongxin Zhang,Xuefeng Huang,Qinhan Huang,Jie Chen,Peng Sun
DOI: https://doi.org/10.48550/arXiv.2303.05197
2023-05-28
Abstract:Strategy card game is a well-known genre that is demanding on the intelligent game-play and can be an ideal test-bench for AI. Previous work combines an end-to-end policy function and an optimistic smooth fictitious play, which shows promising performances on the strategy card game Legend of Code and Magic. In this work, we apply such algorithms to Hearthstone, a famous commercial game that is more complicated in game rules and mechanisms. We further propose several improved techniques and consequently achieve significant progress. For a machine-vs-human test we invite a Hearthstone streamer whose best rank was top 10 of the official league in China region that is estimated to be of millions of players. Our models defeat the human player in all Best-of-5 tournaments of full games (including both deck building and battle), showing a strong capability of decision making.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop an artificial intelligence model that can reach the master level in the complex strategic card game "Hearthstone". Specifically, the researchers hope to improve existing algorithms and techniques to overcome the challenges brought by partial observability (incomplete information) in the game and achieve the following goals: 1. **Improve decision - making ability**: In "Hearthstone", players need to make decisions at multiple stages, including choosing heroes, building decks, and fighting. These stages increase the complexity of the game, so strong decision - making ability is required to win the game. 2. **Deal with incomplete information**: Different from traditional perfect - information games, "Hearthstone" is a partially observable game, and players cannot see their opponents' decks and hands. Therefore, how to find the Nash Equilibrium (NE) in this situation is an important problem. 3. **Defeat top - level human players**: To verify the ability of the model, the researchers invited a professional player ranked in the top ten in the official league in China for testing. This player is considered to be in the top 0.0005% of millions of players. ### Main contributions of the paper - **Introduced improved techniques**: The researchers proposed and applied several improved techniques, including adjusting the discount factor \(\gamma\), random initialization of deck building (Random - CB), balancing data production and consumption to reduce off - policy problems, improving the V - Trace algorithm, isolating models by heroes, and cheating by using hidden information, etc. - **Achieved significant progress**: Through these improvements, the model achieved a 73.6% win rate in machine - to - machine evaluations, and in machine - to - human evaluations, the model defeated top - level human players in all Best - of - 5 matches. ### Formula summary - **E2E policy function**: \[ \pi_\theta(\cdot|o)=\delta\pi_\theta^{CB}(\cdot|o)+(1 - \delta)\pi_\theta^{BT}(\cdot|o) \] where \(\delta\) is the phase indicator, and \(\pi_\theta^{CB}\) and \(\pi_\theta^{BT}\) are the policy parameters for the deck - building and battle stages respectively. - **V - Trace value function estimation**: \[ v_t = V_t+\sum_{i = 0}^k\gamma^i\left[\prod_{j = 0}^{i - 1}\min(\rho_{t + j},\bar{c})\right]\min(\rho_{t + i},\bar{\rho})\delta_{t + i} \] where \(\bar{c}\) and \(\bar{\rho}\) are hyperparameters used to control variance and bias. - **PPO policy gradient**: \[ \nabla_\theta\min\left(\frac{(r_t+\gamma v_{t + 1}-V_t)\pi_\theta(a_t|o_t)}{\mu(a_t|o_t)},\frac{(r_t+\gamma v_{t + 1}-V_t)\text{clip}\left(\frac{\pi_\theta(a_t|o_t)}{\mu(a_t|o_t)},1-\epsilon,1+\epsilon\right)}{\mu(a_t|o_t)}\right) \] Through these methods, the researchers successfully improved the performance of the model and demonstrated its strong ability in complex strategic card games.