Abstract:StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach, where the hierarchy involves two. One is the extracted macro-actions from experts’ demonstration trajectories to reduce the action space in an order of magnitude. The other is a hierarchical architecture of neural networks, which is modular and facilitates scale. We investigate a curriculum transfer training procedure that trains the agent from the simplest level to the hardest level. We train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the most difficult cheating level AIs (level-8, level-9, and level-10). We also test our method on different maps to evaluate the extensibility of our approach. By a final 3-layer hierarchical architecture and applying significant tricks to train SC2 agents, we increase the win rate against the level-8, level-9, and level-10 to 96%, 97%, and 94%, respectively. Our codes and models are all open-sourced now at https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained using supervised learning and reinforcement learning on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable and some settings simplified. We then can compare our work with mAS using the same computing resources and training time. By experiment results, we show that our method is more effective when using limited resources. The inference and training codes of mini-AlphaStar are all open-sourced at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.

Centralized control for multi-agent RL in a complex Real-Time-Strategy game

S2rl

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

S2RL: DoWe Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control

SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

From Centralized to Self-Supervised: Pursuing Realistic Multi-Agent Reinforcement Learning

Decentralized multi-agent reinforcement learning based on best-response policies

Decentralized Multi-Agent Reinforcement Learning with Networked Agents: Recent Advances

Multi-agent Reinforcement Learning: A Comprehensive Survey

Towards Distributed Communication and Control in Real-World Multi-Agent Reinforcement Learning

Reinforcement actor-critic learning as a rehearsal in MicroRTS

On Efficient Reinforcement Learning for Full-length Game of StarCraft II

System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games

Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches

Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

Learning Macromanagement in Starcraft by Deep Reinforcement Learning

Multiagent Reinforcement Learning for Strategic Decision Making and Control in Robotic Soccer Through Self-Play