Abstract:Consider $N$ players each with a $d$-dimensional action set. Each of the players' utility functions includes their reward function and a linear term for each dimension, with coefficients that are controlled by the manager. We assume that the game is strongly monotone, so if each player runs gradient descent, the dynamics converge to a unique Nash equilibrium (NE). The NE is typically inefficient in terms of global performance. The resulting global performance of the system can be improved by imposing $K$-dimensional linear constraints on the NE. We therefore want the manager to pick the controlled coefficients that impose the desired constraint on the NE. However, this requires knowing the players' reward functions and their action sets. Obtaining this game structure information is infeasible in a large-scale network and violates the users' privacy. To overcome this, we propose a simple algorithm that learns to shift the NE of the game to meet the linear constraints by adjusting the controlled coefficients online. Our algorithm only requires the linear constraints violation as feedback and does not need to know the reward functions or the action sets. We prove that our algorithm, which is based on two time-scale stochastic approximation, guarantees convergence with probability 1 to the set of NE that meet target linear constraints. We then provide a mean square convergence rate of $O(t^{-1/4})$ for our algorithm. This is the first such bound for two time-scale stochastic approximation where the slower time-scale is a fixed point iteration with a non-expansive mapping. We demonstrate how our scheme can be applied to optimizing a global quadratic cost at NE and load balancing in resource allocation games. We provide simulations of our algorithm for these scenarios.

Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate

An efficient model‐free adaptive optimal control of continuous‐time nonlinear non‐zero‐sum games based on integral reinforcement learning with exploration

Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate Convergence

Learning to Control Unknown Strongly Monotone Games

Neural-network-based safe learning control for non-zero-sum differential games of nonlinear systems with asymmetric input constraints

Control of Nonaffine Nonlinear Discrete-Time Systems Using Reinforcement-Learning-Based Linearly Parameterized Neural Networks

Advanced optimal tracking integrating a neural critic technique for asymmetric constrained zero-sum games

Design and Application of an Adaptive Fuzzy Control Strategy to Zeroing Neural Network for Solving Time-Variant QP Problem

Output-feedback Q-learning for discrete-time linear H-infinity tracking control: A Stackelberg game approach

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

Output‐feedback Q‐learning for discrete‐time linear <i>H</i><sup>∞</sup> tracking control: A Stackelberg game approach

No-Regret Learning in Time-Varying Zero-Sum Games

Event-Triggered ADP for Nonzero-Sum Games of Unknown Nonlinear Systems

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

A Multi-Step Minimax Q-learning Algorithm for Two-Player Zero-Sum Markov Games

Uncoupled and Convergent Learning in Monotone Games under Bandit Feedback

Inverse linear-quadratic nonzero-sum differential games

Non‐zero‐sum games of discrete‐time Markov jump systems with unknown dynamics: An off‐policy reinforcement learning method

Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game