Off-policy Q-learning: Solving Nash Equilibrium of Multi-Player Games with Network-Induced Delay and Unmeasured State.

Jinna Li,Zhenfei Xiao,Jialu Fan,Tianyou Chai,Frank L. L. Lewis
DOI: https://doi.org/10.1016/j.automatica.2021.110076
IF: 6.4
2021-01-01
Automatica
Abstract:In the framework of adaptive dynamic programming combined with Q-learning, this paper investigates networked multi-player games, in which the common state of the plant is transmitted to all players via a network, for finding the Nash equilibrium solution without requiring the system matrices to be known, even though there exists network-induced delay and system state cannot be directly measured. By adding an observer and a virtual Smith predictor for estimating system state and predicting system state, the control policies of players can be successfully designed. Then, a novel off-policy Q-learning algorithm is proposed to learn the Nash equilibrium solution via solving the coupled algebraic Riccati equations using available data, followed by the rigorous proof of convergence of the proposed algorithm. Finally, an example is given to show the effectiveness of the proposed method.
What problem does this paper attempt to address?