Strategy Optimization of Imperfect Information Games Based on NFSP with DDQN

Tuo Qu,Qibin Zhou,Jin Zhu,Fuqing Duan
DOI: https://doi.org/10.1007/978-981-19-6613-2_426
2023-01-01
Abstract:The imperfect information machine games means that the agents participating in the games only know part of the information in the games process, and cannot have a clear grasp of the overall information, so there are more uncertainties and challenges. This paper proposes an optimization method of imperfect information games strategy based on the Neural Fictitious Self-Play (NFSP) with Double Deep Q-Learning (DDQN), where the agent uses the DDQN learning algorithm to train the optimal response strategy network by alternating sampling and optimizing. DDQN ameliorates the problem of over estimation through decoupling the action evaluation and selection of target Q-value, and is conducive to the convergence of the optimal response strategy network. In the training of the average response strategy network, different sampling weights are given to the empirical data according to the temporal difference error. This increases the probability of important empirical data being sampled, makes the network learning more efficient and improves the reliability of average response strategy. Taking Leduc Hold’em as the research object, the proposed method is compared with several state-of-the-art methods, and the experimental results show that the proposed method has a faster convergence speed and higher reliability.
What problem does this paper attempt to address?