NFSP-PLT: Solving Games with a Weighted NFSP-PER-Based Method

Huale Li,Shuhan Qi,Jiajia Zhang,Dandan Zhang,Lin Yao,Xuan Wang,Qi Li,Jing Xiao
DOI: https://doi.org/10.3390/electronics12112396
IF: 2.9
2023-01-01
Electronics
Abstract:Nash equilibrium strategy is a typical goal when solving two-player imperfect-information games (IIGs). Neural fictitious self-play (NFSP) is a popular method to find the Nash equilibrium in IIGs, which is the first end-to-end method used to compute the Nash equilibrium strategy. However, the training of NFSP requires a large number of sample data and the interactive cost of obtaining such data is often very high. Realizing the efficient training of network under limited samples is an urgent problem. In this paper, we first proposed a new NFSP-based method, NFSP with prioritized experience replay (NFSP-PER), to improve the sample training efficiency. Then, a weighted NFSP-PER with learning time (NFSP-PLT) was proposed to control the utilization degree of priority-weighted samples. Furthermore, based on the NFSP-PLT, an adaptive upper-confidence-bound applied to tree (UCT) is used to solve the optimal response strategy, which makes the solving strategy more accurate. Extensive experimental results show that the proposed NFSP-PLT effectively improves the sample learning efficiency compared with the existing works.
What problem does this paper attempt to address?