Abstract:$ $This paper addresses the inverse problem for Linear-Quadratic (LQ) nonzero-sum $N$-player differential games, where the goal is to learn parameters of an unknown cost function for the game, called observed, given the demonstrated trajectories that are known to be generated by stationary linear feedback Nash equilibrium laws. Towards this end, using the demonstrated data, a synthesized game needs to be constructed, which is required to be equivalent to the observed game in the sense that the trajectories generated by the equilibrium feedback laws of the $N$ players in the synthesized game are the same as those demonstrated trajectories. We show a model-based algorithm that can accomplish this task using the given trajectories. We then extend this model-based algorithm to a model-free setting to solve the same problem in the case when the system's matrices are unknown. The algorithms combine both inverse optimal control and reinforcement learning methods making extensive use of gradient descent optimization for the latter. The analysis of the algorithm focuses on the proof of its convergence and stability. To further illustrate possible solution characterization, we show how to generate an infinite number of equivalent games, not requiring to run repeatedly the complete algorithm. Simulation results validate the effectiveness of the proposed algorithms.

Policy Iteration <i>Q</i>-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems

Policy Iteration Based Q-learning for Linear Nonzero-Sum Quadratic Differential Games.

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms

Novel single-loop policy iteration for linear zero-sum games

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

Online Adaptive Q-learning Method for Fully Cooperative Linear Quadratic Dynamic Games

Integral Policy Iteration for Zero-Sum Games with Completely Unknown Nonlinear Dynamics

Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics

Policy-Iteration-Based Learning for Nonlinear Player Game Systems with Constrained Inputs.

Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games with Unknown Dynamics.

GPI-Based Design for Partially Unknown Nonlinear Two-Player Zero-Sum Games

A Multi-Step Minimax Q-learning Algorithm for Two-Player Zero-Sum Markov Games

Relaxed Policy Iteration Algorithm for Nonlinear Zero-Sum Games with Application to H-infinity Control

Data-driven Adaptive Dynamic Programming for Partially Observable Nonzero-Sum Games Via Q-learning Method

Learning Algorithms For Differential Games Of Continuous-Time Systems

Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate

Nash Equilibria for Linear Quadratic Discrete-time Dynamic Games via Iterative and Data-driven Algorithms

Inverse linear-quadratic nonzero-sum differential games

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Non‐zero‐sum games of discrete‐time Markov jump systems with unknown dynamics: An off‐policy reinforcement learning method

Data-Driven Inverse Cooperative Game Control Via Off-Policy Q-Learning