Abstract:Monte-Carlo Tree Search (MCTS) typically uses multi-armed bandit (MAB) strategies designed to minimize cumulative regret, such as UCB1, as its selection strategy. However, in the root node of the search tree, it is more sensible to minimize simple regret. Previous work has proposed using Sequential Halving as selection strategy in the root node, as, in theory, it performs better with respect to simple regret. However, Sequential Halving requires a budget of iterations to be predetermined, which is often impractical. This paper proposes an anytime version of the algorithm, which can be halted at any arbitrary time and still return a satisfactory result, while being designed such that it approximates the behavior of Sequential Halving. Empirical results in synthetic MAB problems and ten different board games demonstrate that the algorithm's performance is competitive with Sequential Halving and UCB1 (and their analogues in MCTS).

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in Monte - Carlo Tree Search (MCTS), how to design a selection strategy algorithm with the "anytime" feature to replace the traditional Sequential Halving (SH) algorithm. ### Problem Background MCTS is a search algorithm for different sequential decision - making problems and is widely used in fields such as games, planning, optimization, and control. One of the four key steps in MCTS is the selection strategy, and the Multi - Armed Bandit (MAB) algorithm is usually used to balance exploration and exploitation. Commonly used MAB algorithms such as UCB1 are mainly used to minimize cumulative regret, while Sequential Halving focuses on minimizing simple regret, which is more suitable in the root node of MCTS. However, Sequential Halving needs to pre - determine the budget of the number of iterations, which is often impractical in practical applications. For example, when dealing with a large and diverse set of games, automatically generated games, or agents with intelligent time management, the lack of the "anytime" feature will lead to performance problems. ### Paper Solution To solve the above problems, this paper proposes a new algorithm - Anytime Sequential Halving. This algorithm can be terminated at any time point and return a satisfactory result while maintaining behavior similar to Sequential Halving. Specifically: - **Anytime Termination Feature**: Anytime SH can be terminated at any time point, and as the processing time increases, the quality of its final decision will gradually improve. - **Behavior Approximation**: Anytime SH is inspired by the standard Sequential Halving, but is adjusted to have the anytime termination feature. - **Experimental Verification**: Through experiments in synthetic MAB problems and ten different board games, the results show that the performance of Anytime SH is comparable to that of UCB1 and Sequential Halving while retaining the anytime termination feature. ### Summary The core problem of this paper is to improve the selection strategy in MCTS so that it can not only minimize simple regret but also have flexible time - management capabilities in practical applications. The proposed Anytime SH algorithm effectively solves this problem and performs well in experiments.

Anytime Sequential Halving in Monte-Carlo Tree Search

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

Monte Carlo Tree Search: a review of recent modifications and applications

Dual Monte Carlo Tree Search

Lookahead Pathology in Monte-Carlo Tree Search

Extreme Value Monte Carlo Tree Search

Playing Carcassonne with Monte Carlo Tree Search

An Analysis on the Effects of Evolving the Monte Carlo Tree Search Upper Confidence for Trees Selection Policy on Unimodal, Multimodal and Deceptive Landscapes

Monte Carlo Tree Search with Boltzmann Exploration

A Survey of Monte Carlo Tree Search Methods

An Efficient Dynamic Sampling Policy for Monte Carlo Tree Search.

VOI-aware MCTS

Fittest Survival: an Enhancement Mechanism for Monte Carlo Tree Search.

Monte Carlo Tree Search in the Presence of Transition Uncertainty

Proof Number Based Monte-Carlo Tree Search

On Effective Parallelization of Monte Carlo Tree Search

Towards Understanding the Effects of Evolving the MCTS UCT Selection Policy

Generalized Mean Estimation in Monte-Carlo Tree Search

Combining Monte-Carlo Tree Search with Proof-Number Search

Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search

Solving Stochastic Orienteering Problems with Chance Constraints Using Monte Carlo Tree Search