Policy Iteration <i>Q</i>-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems

Biao Luo,Yin Yang,Derong Liu
DOI: https://doi.org/10.1109/TCYB.2020.2970969
IF: 11.8
2021-01-01
IEEE Transactions on Cybernetics
Abstract:In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems. This problem theoretically depends on solving the discrete-time game algebraic Riccati equation (DTGARE), while it requires complete system dynamics. To avoid solving the DTGARE, the Q-function is introduced and a data-based policy iteration Q-learning (PIQL) algorithm is developed to learn the optimal Q-function by using data collected from the real system. Writing the Q-function in a quadratic form, it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Frechet derivative. Then, the convergence of the PIQL algorithm can be guaranteed by Kantorovich's theorem. For the realization of the PIQL algorithm, the off-policy learning scheme is proposed using real data rather than the system model. Finally, the efficiency of the developed data-based PIQL method is validated through simulation studies.
What problem does this paper attempt to address?