Abstract:Consider $N$ players each with a $d$-dimensional action set. Each of the players' utility functions includes their reward function and a linear term for each dimension, with coefficients that are controlled by the manager. We assume that the game is strongly monotone, so if each player runs gradient descent, the dynamics converge to a unique Nash equilibrium (NE). The NE is typically inefficient in terms of global performance. The resulting global performance of the system can be improved by imposing $K$-dimensional linear constraints on the NE. We therefore want the manager to pick the controlled coefficients that impose the desired constraint on the NE. However, this requires knowing the players' reward functions and their action sets. Obtaining this game structure information is infeasible in a large-scale network and violates the users' privacy. To overcome this, we propose a simple algorithm that learns to shift the NE of the game to meet the linear constraints by adjusting the controlled coefficients online. Our algorithm only requires the linear constraints violation as feedback and does not need to know the reward functions or the action sets. We prove that our algorithm, which is based on two time-scale stochastic approximation, guarantees convergence with probability 1 to the set of NE that meet target linear constraints. We then provide a mean square convergence rate of $O(t^{-1/4})$ for our algorithm. This is the first such bound for two time-scale stochastic approximation where the slower time-scale is a fixed point iteration with a non-expansive mapping. We demonstrate how our scheme can be applied to optimizing a global quadratic cost at NE and load balancing in resource allocation games. We provide simulations of our algorithm for these scenarios.

Inverse reinforcement learning methods for linear differential games

Inverse Reinforcement Learning for Identification of Linear-Quadratic Zero-Sum Differential Games

Inverse linear-quadratic nonzero-sum differential games

Reinforcement Learning for Inverse Non-Cooperative Linear-Quadratic Output-feedback Differential Games

Reinforcement Learning for Inverse Linear-quadratic Dynamic Non-cooperative Games

Inverse linear quadratic dynamic games using partial state observations

Inverse Reinforcement Learning with Multiple Ranked Experts

Flow Cytometric Characterization of Alveolar Macrophages

Learning Human Behavior in Shared Control: Adaptive Inverse Differential Game Approach

Nash Equilibria for Linear Quadratic Discrete-time Dynamic Games via Iterative and Data-driven Algorithms

Online estimation of objective function for continuous-time deterministic systems

Min–max adaptive dynamic programming for zero-sum differential games

Active Inverse Learning in Stackelberg Trajectory Games

Reinforcement Learning for Non-stationary Discrete-Time Linear–Quadratic Mean-Field Games in Multiple Populations

Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games

Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems

A Differential Dynamic Programming Framework for Inverse Reinforcement Learning

Two person non-zero-sum linear-quadratic differential game with Markovian jumps in infinite horizon

Learning to Control Unknown Strongly Monotone Games

Linear Supervision for Nonlinear, High-Dimensional Neural Control and Differential Games

Asymmetric Feedback Learning in Online Convex Games