Abstract:Strategy card game is a well-known genre that is demanding on the intelligent game-play and can be an ideal test-bench for AI. Previous work combines an end-to-end policy function and an optimistic smooth fictitious play, which shows promising performances on the strategy card game Legend of Code and Magic. In this work, we apply such algorithms to Hearthstone, a famous commercial game that is more complicated in game rules and mechanisms. We further propose several improved techniques and consequently achieve significant progress. For a machine-vs-human test we invite a Hearthstone streamer whose best rank was top 10 of the official league in China region that is estimated to be of millions of players. Our models defeat the human player in all Best-of-5 tournaments of full games (including both deck building and battle), showing a strong capability of decision making.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to develop an artificial intelligence model that can reach the master level in the complex strategic card game "Hearthstone". Specifically, the researchers hope to improve existing algorithms and techniques to overcome the challenges brought by partial observability (incomplete information) in the game and achieve the following goals: 1. **Improve decision - making ability**: In "Hearthstone", players need to make decisions at multiple stages, including choosing heroes, building decks, and fighting. These stages increase the complexity of the game, so strong decision - making ability is required to win the game. 2. **Deal with incomplete information**: Different from traditional perfect - information games, "Hearthstone" is a partially observable game, and players cannot see their opponents' decks and hands. Therefore, how to find the Nash Equilibrium (NE) in this situation is an important problem. 3. **Defeat top - level human players**: To verify the ability of the model, the researchers invited a professional player ranked in the top ten in the official league in China for testing. This player is considered to be in the top 0.0005% of millions of players. ### Main contributions of the paper - **Introduced improved techniques**: The researchers proposed and applied several improved techniques, including adjusting the discount factor \(\gamma\), random initialization of deck building (Random - CB), balancing data production and consumption to reduce off - policy problems, improving the V - Trace algorithm, isolating models by heroes, and cheating by using hidden information, etc. - **Achieved significant progress**: Through these improvements, the model achieved a 73.6% win rate in machine - to - machine evaluations, and in machine - to - human evaluations, the model defeated top - level human players in all Best - of - 5 matches. ### Formula summary - **E2E policy function**: \[ \pi_\theta(\cdot|o)=\delta\pi_\theta^{CB}(\cdot|o)+(1 - \delta)\pi_\theta^{BT}(\cdot|o) \] where \(\delta\) is the phase indicator, and \(\pi_\theta^{CB}\) and \(\pi_\theta^{BT}\) are the policy parameters for the deck - building and battle stages respectively. - **V - Trace value function estimation**: \[ v_t = V_t+\sum_{i = 0}^k\gamma^i\left[\prod_{j = 0}^{i - 1}\min(\rho_{t + j},\bar{c})\right]\min(\rho_{t + i},\bar{\rho})\delta_{t + i} \] where \(\bar{c}\) and \(\bar{\rho}\) are hyperparameters used to control variance and bias. - **PPO policy gradient**: \[ \nabla_\theta\min\left(\frac{(r_t+\gamma v_{t + 1}-V_t)\pi_\theta(a_t|o_t)}{\mu(a_t|o_t)},\frac{(r_t+\gamma v_{t + 1}-V_t)\text{clip}\left(\frac{\pi_\theta(a_t|o_t)}{\mu(a_t|o_t)},1-\epsilon,1+\epsilon\right)}{\mu(a_t|o_t)}\right) \] Through these methods, the researchers successfully improved the performance of the model and demonstrated its strong ability in complex strategic card games.

Mastering Strategy Card Game (Hearthstone) with Improved Techniques

Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End Policy and Optimistic Smooth Fictitious Play

The Many AI Challenges of Hearthstone

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Suphx: Mastering Mahjong with Deep Reinforcement Learning

Optimizing Hearthstone Agents using an Evolutionary Algorithm

Introducing the Hearthstone-AI Competition

Playing Card-Based RTS Games with Deep Reinforcement Learning.

SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

Deep Surrogate Assisted MAP-Elites for Automated Hearthstone Deckbuilding

DanZero+: Dominating the GuanDan Game through Reinforcement Learning

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Full DouZero+: Improving DouDizhu AI by Opponent Modeling, Coach-Guided Training and Bidding Learning

Hierarchical Macro Strategy Model for MOBA Game AI

Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents

ScrofaZero: Mastering Trick-taking Poker Game Gongzhu by Deep Reinforcement Learning

Supervised Learning Achieves Human-Level Performance in MOBA Games: A Case Study of Honor of Kings

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

Mapping Hearthstone Deck Spaces through MAP-Elites with Sliding Boundaries

Mastering Chinese Chess AI (Xiangqi) Without Search