Online Immediate Orientation in Monte Carlo Counterfactual Regret Minimization for Simultaneous Games

Yun Li,Jiao Wang,Xinyue Su
DOI: https://doi.org/10.1109/ccdc62350.2024.10588276
2024-01-01
Abstract:Recent research shows that simultaneous move games can be modeled as the imperfect information problem, more accurately in simulating the characteristics of simultaneous decision-making and gaining more favorable strategies. Furthermore, Monte Carlo Counterfactual Regret Minimization (MCCFR) is considered as a valid method for imperfect information. However, the convergence rate is seriously affected by sampling times and the high variance of estimations, which restricts the direct application to large simultaneous games. To address those challenges, we introduce an improved variant of MCCFR, namely Online Immediate Orientation in Monte Carlo Counterfactual Regret Minimization (OIO-MCCFR). OIO-MCCFR involves immediate rewards to orientate search. In addition, for reducing the excessive variance of estimation, control variate and state-action baseline are employed. Moreover, the new formulation has been proved to possess the probabilistic bound between the estimated unbiased regret and the accurate value. We evaluate OIO-MCCFR in the Goofspiel of diverse scales, which shows that our approach significantly outperforms vanilla MCCFR. More importantly, our experimental results also indicate that the larger the game scale, the more advantage of the OIO-MCCFR.
What problem does this paper attempt to address?