Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning

Jinna Li,Zhenfei Xiao,Ping Li
DOI: https://doi.org/10.1109/access.2019.2939384
IF: 3.9
2019-01-01
IEEE Access
Abstract:In this paper, an off-policy game Q-learning algorithm is proposed for solving linear discrete-time non-zero sum multi-player game problems. Unlike the existing Q-learning methods for solving the Riccati equation by on-policy learning approaches for multi-player games, an off-policy game Q-learning method is developed for achieving the Nash equilibrium of multiple players. To this end, first, a non-zero sum game problem is formulated, and the value function and the Q-function defined according to each-player individual performance index are rigorously proved to be linear quadratic forms. Then, based on the dynamic programming and Q-learning methods, an off-policy game Q-learning algorithm is developed to find the control policies for multi-player games, such that the Nash equilibrium is reached under the learned control policies. The merit of this paper lies in that the proposed algorithm does not require the system model parameters to be known a priori and fully utilizes measurable data to learn the Nash equilibrium solution. Moreover, there is no bias of Nash equilibrium solution when implementing the proposed off-policy game Q-learning algorithm even though probing noises are added to control policies for maintaining the persistent excitation condition. While bias of the Nash equilibrium solution could be produced if on-policy game Q-learning is employed. This is another contribution of this paper
What problem does this paper attempt to address?