An Efficient Node Selection Policy for Value Network Based Monte Carlo Tree Search

Xiaotian Liu,Yijie Peng,Gongbo Zhang,Ruanbao Zhou
DOI: https://doi.org/10.2139/ssrn.4450999
2023-01-01
Abstract:Monte Carlo Tree Search (MCTS) is recently gaining increasing popularity for its effectiveness in solving large-scale decision problems with controllable computation costs. The success of AlphaGo Zero prompts a new trend of incorporating a value network constructed with Neural Network (NN) into MCTS, namely value network-based MCTS. As a key factor that determines the performance and efficiency of MCTS, the node selection policies of value network-based MCTS in previous literature are all based on the Upper Confidence Bound for Trees (UCT). In this work, we formulate the node selection problem in value network-based MCTS as a Ranking and Selection (R&S) problem under two statistical assumptions and provide a new selection policy based on an R&S algorithm called Approximately Optimal Allocation Policy (AOAP). We prove the consistency of the proposed selection policy and show that the value network could exceptionally benefit R&S-based node selection policies by providing prior knowledge. We conduct numerical experiments on a board game Tic-tac-toe and the results show that AOAP outperforms the UCT selection policy used in AlphaGo Zero, which implies the potential of constructing node selection policies in value network-based MCTS with R&S methods.
What problem does this paper attempt to address?