Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Weiyu Ma,Qirui Mi,Yongcheng Zeng,Xue Yan,Yuqiao Wu,Runji Lin,Haifeng Zhang,Jun Wang

2024-06-18

Abstract:StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise micro level operations and strategic macro awareness. Previous works, such as Alphastar and SCC, achieve impressive performance on tackling StarCraft II , however, still exhibit deficiencies in long term strategic planning and strategy interpretability. Emerging large language model (LLM) agents, such as Voyage and MetaGPT, presents the immense potential in solving intricate tasks. Motivated by this, we aim to validate the capabilities of LLMs on StarCraft II, a highly complex RTS <a class="link-external link-http" href="http://game.To" rel="external noopener nofollow">this http URL</a> conveniently take full advantage of LLMs` reasoning abilities, we first develop textual StratCraft II environment, called TextStarCraft II, which LLM agent can interact. Secondly, we propose a Chain of Summarization method, including single frame summarization for processing raw observations and multi frame summarization for analyzing game information, providing command recommendations, and generating strategic decisions. Our experiment consists of two parts: first, an evaluation by human experts, which includes assessing the LLMs`s mastery of StarCraft II knowledge and the performance of LLM agents in the game; second, the in game performance of LLM agents, encompassing aspects like win rate and the impact of Chain of Summarization.Experiment results demonstrate that: 1. LLMs possess the relevant knowledge and complex planning abilities needed to address StarCraft II scenarios; 2. Human experts consider the performance of LLM agents to be close to that of an average player who has played StarCraft II for eight years; 3. LLM agents are capable of defeating the built in AI at the Harder(Lv5) difficulty level. We have open sourced the code and released demo videos of LLM agent playing StarCraft II.

Artificial Intelligence

What problem does this paper attempt to address?

### The Problems This Paper Attempts to Solve This paper primarily addresses the following key issues: 1. **Benchmarking Real-Time Strategic Decision-Making and Long-Term Planning**: As large language models (LLMs) continue to improve in reasoning, planning, and decision-making, evaluating these capabilities becomes crucial. However, there is a significant gap in existing benchmarks for real-time strategic decision-making and long-term planning, especially in complex game environments like StarCraft II. Therefore, the authors developed the `TextStarCraft II` environment to fill this gap. 2. **Enhancing LLMs' Strategic Decision-Making Abilities**: To address the limitations of traditional Chain of Thought (CoT) methods in handling complex information, the authors proposed the Chain of Summarization (CoS) method. This method improves the efficiency of LLMs in processing complex information and making strategic decisions through single-frame and multi-frame summarization modules. 3. **Diversified Evaluation Methods**: To comprehensively evaluate the performance of LLMs in StarCraft II, the authors not only tested the knowledge mastery of various commercial models but also conducted human expert reviews and human-machine combat experiments. These diversified evaluation methods demonstrated the potential of LLMs in strategic decision-making and human-like gameplay. In summary, this paper aims to enhance and evaluate the performance of LLMs in real-time strategic decision-making and long-term planning tasks by developing new environments and methods, particularly in complex game environments.

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

LLM-PySC2: Starcraft II learning environment for Large Language Models

SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models

S2rl

SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

On Efficient Reinforcement Learning for Full-length Game of StarCraft II

Revisiting of AlphaStar

TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game

SC-Phi2: A Fine-tuned Small Language Model for StarCraft II Macromanagement Tasks

Grandmaster level in StarCraft II using multi-agent reinforcement learning

A Hierarchical Model for StarCraft II Mini-Game

StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning

An Introduction of mini-AlphaStar

A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models

TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game

StarCraft II: A New Challenge for Reinforcement Learning

Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Benchmarking Large Language Model (LLM) Performance for Game Playing via Tic-Tac-Toe

SmartPlay: A Benchmark for LLMs as Intelligent Agents