Abstract:This paper integrates game theory, optimal control theory and reinforcement learning to deal with the discrete-time (DT) multi-player non-zero-sum game issue. As is known, the solutions to non-zero-sum game problems are the outcomes of coupled Riccati equations or coupled Hamilton–Jacobi ones, which are generally difficult to solve analytically and require the knowledge of accurate system mathematical models. However, for most practical industrial systems, the system dynamics cannot be obtained accurately or even unavailable, and the conventional model-based methods will be invalid. To overcome this deficiency, we develop data-based adaptive dynamic programming (ADP) algorithms for completely unknown multi-player systems. Firstly, the Nash equilibrium and stationarity conditions are used to formulate the DT multi-player non-zero-sum game, and then policy iteration algorithm is applied to approximate optimal solutions successively. Secondly, a novel online ADP algorithm combined with a neural-network-based identification scheme is designed and only requires the system data instead of the real system models. Subsequently, a data-driven action-dependent heuristic dynamic programming approach is presented and circumvents the estimation errors caused by the identification learning procedure. Finally, two simulation examples are provided to illustrate the feasibility of our schemes.

Off-policy Based Adaptive Dynamic Programming Method for Nonzero-Sum Games on Discrete-Time System

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms

Non‐zero‐sum games of discrete‐time Markov jump systems with unknown dynamics: An off‐policy reinforcement learning method

Data-driven Adaptive Dynamic Programming Schemes for Non-Zero-sum Games of Unknown Discrete-Time Nonlinear Systems

Robust Adaptive Dynamic Programming of Two-Player Zero-Sum Games for Continuous-Time Linear Systems

Data-driven Adaptive Dynamic Programming for Partially Observable Nonzero-Sum Games Via Q-learning Method

Model-free Adaptive Dynamic Programming for Online Optimal Solution of the Unknown Nonlinear Zero-Sum Differential Game

Policy Gradient Adaptive Dynamic Programming for Nonlinear Discrete-Time Zero-Sum Games with Unknown Dynamics

Optimal Tracking Control for Non-Zero-sum Games of Linear Discrete-Time Systems Via Off-Policy Reinforcement Learning

Robust Adaptive Dynamic Programming for A Zero-Sum Differential Game

Adaptive Dynamic Programming for Solving Non-Zero-Sum Differential Games.

Model‐free Adaptive Optimal Control of Continuous‐time Nonlinear Non‐zero‐sum Games Based on Reinforcement Learning

Online Iterative Adaptive Dynamic Programming Approach for Solving the Zero-Sum Game for Nonlinear Continuous-Time Systems with Partially Unknown Dynamics

Online Finite-Horizon Optimal Learning Algorithm for Nonzero-Sum Games with Partially Unknown Dynamics and Constrained Inputs

Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game

Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics

Off-policy Integral Reinforcement Learning Algorithm in Dealing with Nonzero Sum Game for Nonlinear Distributed Parameter Systems.

Robust Adaptive Dynamic Programming for A Three-Player Zero-Sum Differential Game with Unmatched Uncertainties

Event-Triggered Adaptive Dynamic Programming for Non-Zero-Sum Games of Unknown Nonlinear Systems Via Generalized Fuzzy Hyperbolic Models

Particle Swarm Optimization-Based Neuro-Dynamic Programming for Nonzero-Sum Games of Multi-Player Nonlinear Systems

Event-Triggered Adaptive Dynamic Programming for Continuous-Time Nonlinear Two-Player Zero-Sum Game