Abstract:Despite its groundbreaking success in Go and computer games, Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree, which calls for effective parallelization. However, how to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood. In this paper, we seek to lay its first theoretical foundation, by examining the potential performance loss caused by parallelization when achieving a desired speedup. In particular, we discover the necessary conditions of achieving a desirable parallelization performance, and highlight two of their practical benefits. First, by examining whether existing parallel MCTS algorithms satisfy these conditions, we identify key design principles that should be inherited by future algorithms, for example tracking the unobserved samples (used in WU-UCT (Liu et al., 2020)). We theoretically establish this essential design facilitates $\mathcal{O} ( \ln n + M / \sqrt{\ln n} )$ cumulative regret when the maximum tree depth is 2, where $n$ is the number of rollouts and $M$ is the number of workers. A regret of this form is highly desirable, as compared to $\mathcal{O} ( \ln n )$ regret incurred by a sequential counterpart, its excess part approaches zero as $n$ increases. Second, and more importantly, we demonstrate how the proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms. To illustrate this, we propose a new parallel MCTS algorithm, called BU-UCT, by following our theoretical guidelines. The newly proposed algorithm, albeit preliminary, out-performs four competitive baselines on 11 out of 15 Atari games. We hope our theoretical results could inspire future work of more effective parallel MCTS.

ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Epistemic Monte Carlo Tree Search

Dual Monte Carlo Tree Search

Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions

Mastering Atari Games with Limited Data

Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search

AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time

Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment

Monte Carlo Tree Search: a review of recent modifications and applications

Fittest Survival: an Enhancement Mechanism for Monte Carlo Tree Search.

Mastering construction heuristics with self-play deep reinforcement learning

An Efficient Dynamic Sampling Policy for Monte Carlo Tree Search.

Monte-Carlo Graph Search for AlphaZero

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Adaptive Warm-Start MCTS in AlphaZero-like Deep Reinforcement Learning

On Effective Parallelization of Monte Carlo Tree Search

MetroZero: Deep Reinforcement Learning and Monte Carlo Tree Search for Optimized Metro Network Expansion

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

Monte Carlo Tree Search based Space Transfer for Black-box Optimization

MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games