Fictitious self-play-based multi-person incomplete information game policy resolving method, device and system as well as storage medium

Wang Xuan,Qi Shuhan,Jiang Lin,Hu Shuhao,Mao Jianbo,Liao Qing,Li Huale,Zhang Jiajia,Liu Yang,Xia Wen
2019-01-01
Abstract:The invention provides a fictitious self-play-based multi-person incomplete information game policy resolving method, device and system as well as a storage medium. The method comprises the followingsteps: specific to a two-person gaming condition, implementing the generation of an average policy by using multi-type logistic regression and reservoir sampling, and implementing the generation of anoptimal response policy by using a DQN (Deep Q-Network) and annular buffering memory; and specific to a multi-person gaming condition, implementing the optimal response policy by using a multi-agentproximal policy optimization (MAPPO) algorithm, and meanwhile adjusting agent training by using multi-agent NFSP (Neural Fictitious Self-Play). The method has the beneficial effects that a fictitiousself-play algorithm framework is introduced; the Texas Poker policy optimizing process is partitioned into optimal response policy learning and average policy learning which are implemented by simulation learning and deep enhancement learning respectively; and a more universal multi-agent optimal policy learning method is designed.
What problem does this paper attempt to address?