Neural Future-Dependent Online Mirror Descent.

Kezhe Xie,Weiming Liu,Bin Li
DOI: https://doi.org/10.1145/3639631.3639632
2023-01-01
Abstract:In recent years, many neural-network-based iterative Nash Equilibrium Finding (NEF) algorithms have been proposed for solving large-scale Extensive-Form Imperfect-Information Games (EF-IIGs). These algorithms use neural networks to approximate the values and strategies that change at every iteration, which enables the application of the NEF algorithms in large-scale games. However, previous algorithms usually have to approximate some kind of values that accumulated at every iteration and suffer from the problem of accumulating approximation error as the iteration progresses. As a result, the algorithms are prone to premature convergence and resulting high-exploitability strategies. Additionally, due to the highly dynamic of the approximation target, it may require many training steps for a good approximation in every iteration. In this paper, we propose a new algorithm named Neural FD-OMD. Neural FD-OMD does not approximate cumulative variables, but the strategy at every iteration. So the approximation error is controlled in each iteration and will not propagate to the future iterations. More importantly, Neural FD-OMD limits strategy deviation between iterations. As a result, Neural FD-OMD can use much fewer training steps than previous algorithms to track strategy by fine-tuning the neural network. Experiments on multiple poker games show that Neural FD-OMD outperforms previous NEF algorithms significantly, especially in large-scale games.
What problem does this paper attempt to address?