Off-policy Based Adaptive Dynamic Programming Method for Nonzero-Sum Games on Discrete-Time System

Yinlei Wen,Huaguang Zhang,He Ren,Kun Zhang
DOI: https://doi.org/10.1016/j.jfranklin.2020.05.038
IF: 4.246
2020-01-01
Journal of the Franklin Institute
Abstract:In this paper, a novel model-free reinforcement learning method based on off-policy is introduced to solve nonzero-sum games of discrete-time linear systems. Compared with the traditional policy iteration (PI) method, which requires the knowledge of system dynamics, the proposed method can be trained by state data directly. Moreover, the traditional PI method is proved to be influenced by probing noises. In the analysis of the proposed method, the probing noises are specifically considered and proved to have no influence on the convergence. The solution of the optimal Nash equilibrium is deduced. It is also proved that the proposed algorithm can be applied in both online manner and offline manner. A simulation of the nonzero-sum games control problem on an F-16 aircraft discrete-time system is presented, and the results verify the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?