Exploration Analysis in Finite-Horizon Turn-based Stochastic Games.

Jialian Li,Yichi Zhou,Tongzheng Ren,Jun Zhu
2020-01-01
Abstract:Exploration and exploitation trade-off is one of the key concerns in reinforcement learning. Previous work on one-player Markov Decision Processes has reached near-optimal results for both PAC and high probability regret guarantees. However, such an analysis is lacking for the more complex stochastic games with multi-players, where all players aim to find an approximate Nash Equilibrium. In this work, we address the exploration issue for the N-player finite-horizon turn-based stochastic games (FTSG). We propose a framework, Upper Bounding the Values for Players (UBVP), to guide exploration in FTSGs. UBVP leverages the key insight that players choose the optimal policy conditioning on the policies of the others simultaneously; thus players can explore in the face of uncertainty and get close to the Nash Equilibrium. Based on UBVP, we present two provable algorithms. One is Uniform-PAC with a sample complexity of (O) over tilde (1/epsilon(2)) to get an epsilon-Nash Equilibrium for arbitrary epsilon > 0, and the other has a cumulative exploitability of (O) over tilde(root T) with high probability.
What problem does this paper attempt to address?