Parallel Cross Entropy Policy Gradient Adaptive Dynamic Programming for Optimal Tracking Control of Discrete-Time Nonlinear Systems
Jiahui Xu,Jingcheng Wang,Jun Rao,Yanjiu Zhong,Shunyu Wu,Qifang Sun
DOI: https://doi.org/10.1109/tsmc.2024.3373456
2024-01-01
IEEE Transactions on Systems, Man, and Cybernetics: Systems
Abstract:Policy gradient adaptive dynamic programming (PGADP) is a recently acclaimed control technique for the optimal control design of nonlinear systems. Nevertheless, it demands a substantial amount of interaction data with the controlled system, which can prove costly or perilous in certain scenarios. This article introduces a parallel cross entropy optimization method-based PGADP (PCEOM-PGADP) algorithm, with the objective of devising an optimal tracking controller for discrete-time nonlinear systems. The tracking problem is transformed into a regulation problem by constructing a tracking error system. Furthermore, the implementation of the proposed algorithm employs an actor–critic structure, where the actor network represents the control policy and the critic network assesses its performance. Through the iterative interaction, the optimal policy is ultimately derived. The approach also leverages the parallel cross entropy optimization method (PCEOM) to acquire a reasonable initial control policy for PGADP, thereby accelerating the efficiency of the learning process. Convergence analysis of the algorithm is conducted by demonstrating that the generated $Q$ function constitutes a monotonically nonincreasing sequence. Finally, the effectiveness of the proposed PCEOM-PGADP algorithm is verified through simulation on a complex automated driving tracking system.
automation & control systems,computer science, cybernetics