A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games

Xiaoyan Hu,Li Xia,Jun Yang,Qianchuan Zhao
DOI: https://doi.org/10.1109/ddcls49620.2020.9275075
2020-01-01
Abstract:Among existing algorithms for solving imperfect-information extensive-form games, Monte Carlo Counterfactual Regret Minimization (MCCFR) and its variants are the most popular ones. However, MCCFR suffers from slow convergence due to its high variance in estimating values. In this paper, we introduce Semi-OS, a fast-convergence method developed from Outcome-Sampling MCCFR (OS), the most popular variant of MCCFR. Semi-OS makes two novel modifications to OS. First, Semi-OS stores all histories and their values at each information set. Second, after each time we update the strategy, Semi-OS requires a full game-tree traversal to update these values. These two modifications yield a better estimation of regrets. We show that, by selecting an appropriate discount rate, Semi-OS not only significantly speeds up the convergence rate in Leduc Poker but also statistically outperforms OS in head-to-head matches of Leduc Poker, a common testbed of imperfect information games, involving 200,000 hands.
What problem does this paper attempt to address?