Online Finite-Horizon Optimal Learning Algorithm for Nonzero-Sum Games with Partially Unknown Dynamics and Constrained Inputs

Xiaohong Cui,Huaguang Zhang,Yanhong Luo,Peifu Zu
DOI: https://doi.org/10.1016/j.neucom.2015.12.021
IF: 6
2016-01-01
Neurocomputing
Abstract:In this paper, an online optimal learning algorithm based on adaptive dynamic programming (ADP) approach is designed to solve the finite-horizon optimal control for multi-player nonzero-sum games with partially unknown dynamics and constrained control inputs. Firstly, it is proved that the online policy iteration (PI) algorithm is equivalent to Newton׳s iteration. Secondly, the single neural networks (NNs) with time-varying activation functions for each player are used to approximate the time-varying solution to the coupled Hamilton–Jacobi–Bellman (HJB) equations in an online and forward-in-time manner. Control constraints are handled through non-quadratic functions. The convergence of NN-based online optimal learning algorithm for the multi-player nonzero-sum games is also proved. Finally, a simulation example illustrates the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?