Solving Poker Games Efficiently: Adaptive Memory Based Deep Counterfactual Regret Minimization

Shuqing Shi,Xiaobin Wang,Dong Hao,Zhiyou Yang,Hong Qu
DOI: https://doi.org/10.1109/ijcnn55064.2022.9892417
2022-01-01
Abstract:Poker game has become one of the most prevailing benchmark environment to discover algorithms for sequential games with imperfect information (SGII). However, in games with large state space, it is hard to traverse the whole game tree. This is because the space of history is exponentially increasing with the input size of the game. Other attempts like truncating the game tree with certain length have also been made to solve this problem. But determine the most suitable length could require enormous amount of resources. All of these obstacles make algorithms for SGII much harder to design. To solve this kind of problem, we propose the adaptive memory sampling method which aims to find the distribution of the sampling length by using posterior sampling to update it iteratively. In the real-world human interaction, to what extent a human memory can last often varies significantly depending on the importance of the interaction trajectory. So we also adopted the Long Short-Term Memory (LSTM) network as the sub-procedure to classify the histories and making prediction of future game states and actions based on historical sampled data. According to our theoretical analysis, our method performs better than the state-of-the-art algorithms. On the other hand, The empirical results support our results.
What problem does this paper attempt to address?