Abstract:Consider $N$ players each with a $d$-dimensional action set. Each of the players' utility functions includes their reward function and a linear term for each dimension, with coefficients that are controlled by the manager. We assume that the game is strongly monotone, so if each player runs gradient descent, the dynamics converge to a unique Nash equilibrium (NE). The NE is typically inefficient in terms of global performance. The resulting global performance of the system can be improved by imposing $K$-dimensional linear constraints on the NE. We therefore want the manager to pick the controlled coefficients that impose the desired constraint on the NE. However, this requires knowing the players' reward functions and their action sets. Obtaining this game structure information is infeasible in a large-scale network and violates the users' privacy. To overcome this, we propose a simple algorithm that learns to shift the NE of the game to meet the linear constraints by adjusting the controlled coefficients online. Our algorithm only requires the linear constraints violation as feedback and does not need to know the reward functions or the action sets. We prove that our algorithm, which is based on two time-scale stochastic approximation, guarantees convergence with probability 1 to the set of NE that meet target linear constraints. We then provide a mean square convergence rate of $O(t^{-1/4})$ for our algorithm. This is the first such bound for two time-scale stochastic approximation where the slower time-scale is a fixed point iteration with a non-expansive mapping. We demonstrate how our scheme can be applied to optimizing a global quadratic cost at NE and load balancing in resource allocation games. We provide simulations of our algorithm for these scenarios.

Neural-network-based Learning Algorithms for Cooperative Games of Discrete-Time Multi-Player Systems with Control Constraints Via Adaptive Dynamic Programming

Adaptive algorithm for multi-agent learning optimal cooperative pursuit strategy based on Markov game

Learning Intra-group Cooperation in Multi-agent Systems.

Cooperative Path Following Control in Autonomous Vehicles Graphical Games: A Data-Based Off-Policy Learning Approach

Value Iteration-Based Cooperative Adaptive Optimal Control for Multi-Player Differential Games With Incomplete Information

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Optimal Leader-Following Consensus Control of Multi-Agent Systems: A Neural Network Based Graphical Game Approach

Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game

Online Reinforcement Learning-based Neural Network Controller Design for Affine Nonlinear Discrete-time Systems.

Online optimal consensus control of unknown linear multi-agent systems via time-based adaptive dynamic programming

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Adaptive Dynamic Programming for Nonaffine Nonlinear Optimal Control Problem with State Constraints

Inverse optimal stabilization of cooperative control in networked multi-agent systems

Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning

An efficient model‐free adaptive optimal control of continuous‐time nonlinear non‐zero‐sum games based on integral reinforcement learning with exploration

Differential-game for resource aware approximate optimal control of large-scale nonlinear systems with multiple players

Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system

Adaptive Dynamic Programming for a Nonlinear Two‐Player Non‐Zero‐Sum Differential Game With State and Input Constraints

Learning to Control Unknown Strongly Monotone Games

A Scalable Game Theoretic Approach for Coordination of Multiple Dynamic Systems