Abstract:We study a subclass of $n$-player stochastic games, namely, stochastic games with independent chains and unknown transition matrices. In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players. However, players' decisions are coupled through their payoff functions. We assume players can receive only realizations of their payoffs, and that the players can not observe the states and actions of other players, nor do they know the transition probability matrices of their own Markov chain. Relying on a compact dual formulation of the game based on occupancy measures and the technique of confidence set to maintain high-probability estimates of the unknown transition matrices, we propose a fully decentralized mirror descent algorithm to learn an $\epsilon$-NE for this class of games. The proposed algorithm has the desired properties of independence, scalability, and convergence. Specifically, under no assumptions on the reward functions, we show the proposed algorithm converges in polynomial time in a weaker distance (namely, the averaged Nikaido-Isoda gap) to the set of $\epsilon$-NE policies with arbitrarily high probability. Moreover, assuming the existence of a variationally stable Nash equilibrium policy, we show that the proposed algorithm converges asymptotically to the stable $\epsilon$-NE policy with arbitrarily high probability. In addition to Markov potential games and linear-quadratic stochastic games, this work provides another subclass of $n$-player stochastic games that, under some mild assumptions, admit polynomial-time learning algorithms for finding their stationary $\epsilon$-NE policies.

Off-policy Q-learning: Solving Nash Equilibrium of Multi-Player Games with Network-Induced Delay and Unmeasured State.

Generalized Nash Equilibrium Seeking for Networked Noncooperative games with a Dynamic Event-Triggered Mechanism

Dynamic channel selection in unknown environment based on graphical game and multi-Q learning

Efficient off‐policy Q‐learning for multi‐agent systems by solving dual games

Cooperative Path Following Control in Autonomous Vehicles Graphical Games: A Data-Based Off-Policy Learning Approach

Accelerating Nash Q-Learning with Graphical Game Representation and Equilibrium Solving

Continuous-time Distributed Nash Strategy over Switching Topologies with Gain Adaptation

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

Empirical Policy Optimization for n-Player Markov Games

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

Distributed Multi-Coalition Games with General Linear Systems over Markovian Switching Networks

Scalable and Independent Learning of Nash Equilibrium Policies in $n$-Player Stochastic Games with Unknown Independent Chains

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Resolving Implicit Coordination in Multi-Agent Deep Reinforcement Learning with Deep Q-Networks & Game Theory

Learning Stationary Nash Equilibrium Policies in [math]-Player Stochastic Games with Independent Chains

Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

Multiplayer Stackelberg-Nash Game for Nonlinear System via Value Iteration-Based Integral Reinforcement Learning

Network games with dynamic players: Stabilization and output convergence to Nash equilibrium

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game

Adaptive approaches for fully distributed Nash equilibrium seeking in networked games