Non-Myopic Knowledge Gradient Policy for Ranking and Selection.

Kexin Qin,L. Jeff Hong,Weiwei Fan
DOI: https://doi.org/10.1109/WSC57314.2022.10015275
2022-01-01
Abstract:We consider the ranking and selection (R&S) problem with fixed simulation budget, in which the budget is assumed to be allocated sequentially. Deriving the optimal sampling procedure for this problem amounts to solving a stochastic dynamic program that is highly intractable. To overcome this difficulty, the existing R&S procedures are often designed from a myopic viewpoint. However, these myopic procedures are only single-step optimal and may have a poor performance for general sequential R&S problems. Therefore, in this paper, we combine two popular lookahead strategies and design a non-myopic knowledge gradient (KG) procedure. Meanwhile, to streamline the computation of procedure, we propose a modified Monte Carlo tree search method specifically designed under the R&S context. We show that the new procedure can exhibit a performance superior to the classic KG.
What problem does this paper attempt to address?