NFSP-PER: an Efficient Sampling NFSP-based Method with Prioritized Experience Replay

Huale Li,Shuhan Qi,Jiajia Zhang,Dandan Zhang,Lin Yao,Xuan Wang,Qi Li,Jing Xiao
DOI: https://doi.org/10.1109/icdis55630.2022.00065
2022-01-01
Abstract:In two-player imperfect-information games (IIGs), Nash equilibrium strategy is a typical goal when solving these games. Neural fictitious self-play (NFSP) is a popular method to find Nash equilibrium in IIGs, which is the first end-to-end method to obtain the Nash equilibrium strategy. In NFSP, it computes best response with deep reinforcement learning method and fits its Nash equilibrium strategy by supervised learning. However, the training of nfsp requires a large number of sample data, and the interactive cost of obtaining such data is often very high. How to realize the efficient training of network under limited samples is an urgent problem to be solved. In this work, we propose a new NFSP-based variant, called NFSP-PRE, which combines NFSP and prioritized experience replay mechanism to improve the sample training efficiency. Extensive experimental results show that the proposed NFSP-PER effectively improves the sample learning efficiency compared with the comparison methds.
What problem does this paper attempt to address?