Convergence of Policy Gradient Methods for Nash Equilibria in General-sum Stochastic Games

Yan Chen,Tao Li
DOI: https://doi.org/10.1016/j.ifacol.2023.10.1494
2023-01-01
IFAC-PapersOnLine
Abstract:We study Nash equilibria learning of a general-sum stochastic game with an unknown transition probability density function. Agents take actions at the current environment state and their joint action influences the transition of the environment state and their immediate rewards. Each agent only observes the environment state and its own immediate reward and is unknown about the actions or immediate rewards of others. We introduce the concept of weighted asymptotic Nash equilibrium with probability 1 and design a two-loop algorithm by the equivalence of Nash equilibrium and variational inequality problems. In the outer loop, we sequentially update a constructed strongly monotone variational inequality by updating a proximal parameter while employing a single-call extra-gradient algorithm in the inner loop for solving the constructed variational inequality. We show that if the associated Minty variational inequality has a solution, then the designed algorithm converges to the k1/2 -weighted asymptotic Nash equilibrium.
What problem does this paper attempt to address?