Abstract:Consider $N$ players each with a $d$-dimensional action set. Each of the players' utility functions includes their reward function and a linear term for each dimension, with coefficients that are controlled by the manager. We assume that the game is strongly monotone, so if each player runs gradient descent, the dynamics converge to a unique Nash equilibrium (NE). The NE is typically inefficient in terms of global performance. The resulting global performance of the system can be improved by imposing $K$-dimensional linear constraints on the NE. We therefore want the manager to pick the controlled coefficients that impose the desired constraint on the NE. However, this requires knowing the players' reward functions and their action sets. Obtaining this game structure information is infeasible in a large-scale network and violates the users' privacy. To overcome this, we propose a simple algorithm that learns to shift the NE of the game to meet the linear constraints by adjusting the controlled coefficients online. Our algorithm only requires the linear constraints violation as feedback and does not need to know the reward functions or the action sets. We prove that our algorithm, which is based on two time-scale stochastic approximation, guarantees convergence with probability 1 to the set of NE that meet target linear constraints. We then provide a mean square convergence rate of $O(t^{-1/4})$ for our algorithm. This is the first such bound for two time-scale stochastic approximation where the slower time-scale is a fixed point iteration with a non-expansive mapping. We demonstrate how our scheme can be applied to optimizing a global quadratic cost at NE and load balancing in resource allocation games. We provide simulations of our algorithm for these scenarios.

Learning with Delayed Payoffs in Population Games using Kullback-Leibler Divergence Regularization

Learning Equilibrium with Estimated Payoffs in Population Games

Gradient Dynamics in Linear Quadratic Network Games with Time-Varying Connectivity and Population Fluctuation

Penalty-Regulated Dynamics and Robust Learning Procedures in Games

The equivalence of dynamic and strategic stability under regularized learning in games

Large Population Games in Radial Loss Networks: Computationally Tractable Equilibria for Distributed Network Admission Control

Learning to Control Unknown Strongly Monotone Games

Game Dynamics and Equilibrium Computation in the Population Protocol Model

Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

Distributed Computation of Nash Equilibria for Monotone Aggregative Games via Iterative Regularization

Learning, evolution and population dynamics

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

On Passivity, Reinforcement Learning and Higher-Order Learning in Multi-Agent Finite Games

Indian Buffet Game with Negative Network Externality and Non-Bayesian Social Learning.

Learning enables adaptation in cooperation for multi-player stochastic games

Learning in Time-Varying Monotone Network Games with Dynamic Populations

Convergent Learning Algorithms for Unknown Reward Games

Stochastic Delay Differential Games: Financial Modeling and Machine Learning Algorithms

Convergence of Learning Dynamics in Stackelberg Games

Learning in Multi-level Stochastic games with Delayed Information

Evolutionary Dynamics of Population Games With an Aspiration-Based Learning Rule