Abstract:We study a subclass of $n$-player stochastic games, namely, stochastic games with independent chains and unknown transition matrices. In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players. However, players' decisions are coupled through their payoff functions. We assume players can receive only realizations of their payoffs, and that the players can not observe the states and actions of other players, nor do they know the transition probability matrices of their own Markov chain. Relying on a compact dual formulation of the game based on occupancy measures and the technique of confidence set to maintain high-probability estimates of the unknown transition matrices, we propose a fully decentralized mirror descent algorithm to learn an $\epsilon$-NE for this class of games. The proposed algorithm has the desired properties of independence, scalability, and convergence. Specifically, under no assumptions on the reward functions, we show the proposed algorithm converges in polynomial time in a weaker distance (namely, the averaged Nikaido-Isoda gap) to the set of $\epsilon$-NE policies with arbitrarily high probability. Moreover, assuming the existence of a variationally stable Nash equilibrium policy, we show that the proposed algorithm converges asymptotically to the stable $\epsilon$-NE policy with arbitrarily high probability. In addition to Markov potential games and linear-quadratic stochastic games, this work provides another subclass of $n$-player stochastic games that, under some mild assumptions, admit polynomial-time learning algorithms for finding their stationary $\epsilon$-NE policies.

Learning in Zero-Sum Markov Games: Relaxing Strong Reachability and Mixing Time Assumptions

Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

Independent and Decentralized Learning in Markov Potential Games

Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov Games

Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

No-Regret Learning in Network Stochastic Zero-Sum Games

Convergence of Heterogeneous Learning Dynamics in Zero-sum Stochastic Games

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds.

No-Regret Learning in Time-Varying Zero-Sum Games

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Learning in Multi-Player Stochastic Games

Finite-Sample Guarantees for Best-Response Learning Dynamics in Zero-Sum Matrix Games

Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Convergence of Learning Dynamics in Stackelberg Games

Learning in games with continuous action sets and unknown payoff functions

Exponentially fast convergence to (strict) equilibrium via hedging

A unified stochastic approximation framework for learning in games

Scalable and Independent Learning of Nash Equilibrium Policies in $n$-Player Stochastic Games with Unknown Independent Chains