Learning-Based N-Step Heuristic Dynamic Programming for Affine Nonlinear Optimal Regulation

Peng Xin,Ding Wang,Mingming Zhao,Mingming Ha,Jin Ren
DOI: https://doi.org/10.23919/ccc55666.2022.9902293
2022-01-01
Abstract:This paper introduces n-step heuristic dynamic programming (NSHDP), which combines regular temporal difference (TD) learning with TD(λ) learning, in order to solve optimal control problems. First, the implementation process of the basic value iteration algorithm is proposed. Then, based on the traditional HDP algorithm, the architecture of the NSHDP(λ) algorithm is described. At the same time, the most important thing is that the stability condition of the NSHDP(λ) algorithm is developed. Furthermore, the one-step critic network, the n-step critic network, and the action network are designed, respectively. Finally, the effectiveness of the proposed algorithm is verified by simulation experiment.
What problem does this paper attempt to address?