A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems

Mingduo Lin,Derong Liu,Bo Zhao,Qionghai Dai,Yi Dong
DOI: https://doi.org/10.1109/icist.2019.8836932
2019-01-01
Abstract:This paper focuses on the data-driven controller design for optimal control problems of nonlinear nonaffine discrete-time systems. A novel policy gradient and Q-learning (PGQL) adaptive algorithm which learns the optimal control policy from real empirical data is developed without requiring system dynamics. A policy iteration scheme is designed to iteratively update the approximate Q-function, and the control policy is improved via gradient method until they converge to the bounded regions of the optimal Q-function and the optimal control policy, respectively. Two neural networks (NNs) are employed to realize the developed algorithm. Moreover, the convergence analysis of approximate Q-function is established. Since the control policy is parameterized, it can be upgraded through updating the actor-NN parameters in the direction of the performance gradient. Finally, the simulation results are given to verify the performance of the developed PGQL adaptive algorithm.
What problem does this paper attempt to address?