Abstract:This paper considers a stochastic Nash game in which each player minimizes an expectation valued composite objective. We make the following contributions. (I) Under suitable monotonicity assumptions on the concatenated gradient map, we derive ({bf optimal}) rate statements and oracle complexity bounds for the proposed variable sample-size proximal stochastic gradient-response (VS-PGR) scheme when the sample-size increases at a geometric rate. If the sample-size increases at a polynomial rate of degree $v u003e 0$, the mean-squared errordecays at a corresponding polynomial rate while the iteration and oracle complexities to obtain an $epsilon$-NE are $mathcal{O}(1/epsilon^{1/v})$ and $mathcal{O}(1/epsilon^{1+1/v})$, respectively. (II) We then overlay (VS-PGR) with a consensus phase with a view towards developing distributed protocols for aggregative stochastic Nash games. In the resulting scheme, when the sample-size and the consensus steps grow at a geometric and linear rate, computing an $epsilon$-NE requires similar iteration and oracle complexities to (VS-PGR) with a communication complexity of $mathcal{O}(ln^2(1/epsilon))$; (III) Under a suitable contractive property associated with the proximal best-response (BR) map, we design a variable sample-size proximal BR (VS-PBR) scheme, where each player solves a sample-average BR problem. Akin to (I), we also give the rate statements, oracle and iteration complexity bounds. (IV) Akin to (II), the distributed variant achieves similar iteration and oracle complexities to the centralized (VS-PBR) with a communication complexity of $mathcal{O}(ln^2(1/epsilon))$ when the communication rounds per iteration increase at a linear rate. Finally, we present some preliminary numerics to provide empirical support for the rate and complexity statements.

Gradient play in stochastic games: stationary points, convergence, and sample complexity

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

Almost Sure Convergence of Networked Policy Gradient over Time-Varying Networks in Markov Potential Games

Geometric Convergence of Gradient Play Algorithms for Distributed Nash Equilibrium Seeking

A Payoff-Based Policy Gradient Method in Stochastic Games with Long-Run Average Payoffs

Decentralized Policy Gradient for Nash Equilibria Learning of General-sum Stochastic Games

On Gradient-Based Learning in Continuous Games

A Necessary and Sufficient Condition Beyond Monotonicity for Convergence of the Gradient Play in Continuous Games

Distributed Variable Sample-Size Gradient-response and Best-response Schemes for Stochastic Nash Games over Graphs.

Convergence of Policy Gradient Methods for Nash Equilibria in General-sum Stochastic Games

Distributed Variable Sample-Size Gradient-response and Best-response Schemes for Stochastic Nash Equilibrium Problems over Graphs

Provable Policy Gradient Methods for Average-Reward Markov Potential Games

Linearly Convergent Variable Sample-Size Schemes for Stochastic Nash Games: Best-Response Schemes and Distributed Gradient-Response Schemes

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

Convergence of Learning Dynamics in Stackelberg Games

A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization.

Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

Gradient Dynamics in Linear Quadratic Network Games with Time-Varying Connectivity and Population Fluctuation

Exponential Convergence of Gradient Methods in Concave Network Zero-sum Games

Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems