Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics

Ruizhuo Song,Qinglai Wei,Huaguang Zhang,Frank L. Lewis
DOI: https://doi.org/10.1109/TCYB.2019.2957406
IF: 11.8
2021-01-01
IEEE Transactions on Cybernetics
Abstract:In this article, off-policy reinforcement learning (RL) algorithm is established to solve the discrete-time N-player nonzero-sum (NZS) games with completely unknown dynamics. The N-coupled generalized algebraic Riccati equations (GARE) are derived, and then policy iteration (PI) algorithm is used to obtain the N-tuple of iterative control and iterative value function. As the system dynamics is necessary in PI algorithm, off-policy RL method is developed for discrete-time N-player NZS games. The off-policy N-coupled Hamilton-Jacobi (HJ) equation is derived based on quadratic value functions. According to the Kronecker product, the N-coupled HJ equation is decomposed into unknown parameter part and the system operation data part, which makes the N-coupled HJ equation solved independent of system dynamics. The least square is used to calculate the iterative value function and N-tuple of iterative control. The existence of Nash equilibrium is proved. The result of the proposed method for discrete-time unknown dynamics NZS games is indicated by the simulation examples.
What problem does this paper attempt to address?