Policy Gradient Reinforcement Learning for Parameterized Continuous-Time Optimal Control

Xindi Yang,Hao Zhang,Zhuping Wang
DOI: https://doi.org/10.1109/ccdc52312.2021.9602755
2021-01-01
Abstract:This paper investigates the optimal control for nonlinear continuous-time systems with unknown dynamics. By using reinforcement learning, the knowledge of system dynamics is relaxed by offline datasets and online interactive data. In order to improve online learning performance and remove the condition of persistent excitation, the action-state value function and parameterized control law are drawn into learning algorithm and parameters analysis. Then, policy gradient reinforcement learning algorithm is presented to learn the optimal parameterized control law under system operating and update it real-time. It is also proven that any of the iterative control law can stabilize the system. Neural networks are used to approximate the action-state value function, parameterized control law, respectively. The weights are obtained by using methods of weighted residual. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.
What problem does this paper attempt to address?