Abstract:In this paper, an off-policy game Q-learning algorithm is proposed for solving linear discrete-time non-zero sum multi-player game problems. Unlike the existing Q-learning methods for solving the Riccati equation by on-policy learning approaches for multi-player games, an off-policy game Q-learning method is developed for achieving the Nash equilibrium of multiple players. To this end, first, a non-zero sum game problem is formulated, and the value function and the Q-function defined according to each-player individual performance index are rigorously proved to be linear quadratic forms. Then, based on the dynamic programming and Q-learning methods, an off-policy game Q-learning algorithm is developed to find the control policies for multi-player games, such that the Nash equilibrium is reached under the learned control policies. The merit of this paper lies in that the proposed algorithm does not require the system model parameters to be known a priori and fully utilizes measurable data to learn the Nash equilibrium solution. Moreover, there is no bias of Nash equilibrium solution when implementing the proposed off-policy game Q-learning algorithm even though probing noises are added to control policies for maintaining the persistent excitation condition. While bias of the Nash equilibrium solution could be produced if on-policy game Q-learning is employed. This is another contribution of this paper

Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning

Off-policy Q-learning: Solving Nash Equilibrium of Multi-Player Games with Network-Induced Delay and Unmeasured State.

Policy Iteration Based Q-learning for Linear Nonzero-Sum Quadratic Differential Games.

Policy Iteration <i>Q</i>-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems with Multi-Source Disturbances Using Off-Policy Q-Learning.

Robust Optimal Tracking Control for Multiplayer Systems by Off‐policy Q‐learning Approach

Efficient off‐policy Q‐learning for multi‐agent systems by solving dual games

Online Adaptive Q-learning Method for Fully Cooperative Linear Quadratic Dynamic Games

Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics

Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games with Unknown Dynamics.

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms

Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games.

Off-policy Synchronous Iteration IRL Method for Multi-Player Zero-Sum Games with Input Constraints

Non‐zero‐sum games of discrete‐time Markov jump systems with unknown dynamics: An off‐policy reinforcement learning method

Off-policy Based Adaptive Dynamic Programming Method for Nonzero-Sum Games on Discrete-Time System

Data-driven Adaptive Dynamic Programming for Partially Observable Nonzero-Sum Games Via Q-learning Method

Policy-Iteration-Based Learning for Nonlinear Player Game Systems with Constrained Inputs.

Online Finite-Horizon Optimal Learning Algorithm for Nonzero-Sum Games with Partially Unknown Dynamics and Constrained Inputs

Optimal Tracking Control for Non-Zero-sum Games of Linear Discrete-Time Systems Via Off-Policy Reinforcement Learning

Neural-network-based Synchronous Iteration Learning Method for Multi-Player Zero-Sum Games.