Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces
Jun Rao,Jingcheng Wang,Jiahui Xu,Shangwei Zhao
DOI: https://doi.org/10.1007/s11071-023-08909-6
IF: 5.741
2023-09-29
Nonlinear Dynamics
Abstract:Optimal control of nonlinear systems by using adaptive dynamic programming (ADP) methods is always a hot topic in recent years. However, unknown nonlinear systems with limited data and the infinite horizon of optimal control performance problems make many proposed methods no longer efficient or applicable. To solve the above issues, a novel model-free T-DPG(λ$$\lambda $$) with ET method has been presented for a class of affine discrete-time nonlinear systems. By utilizing eligibility traces, the new method can expand the information of limited data and guides the control faster in the optimal direction. The finite terms, rather than infinite terms, of the optimal performance, are used to solve the infinite-horizon optimal control problems. Considering the unknown dynamic systems, a model-free algorithm sampling sequences only obtaining the control signal and state transition process is proposed. Furthermore, the convergence and boundedness of the algorithm are roughly proved. With a neural network based actor-critic architecture, the optimal policy is well-approximated by actor networks. Finally, the effectiveness of the proposed algorithm is demonstrated by two simulation examples.
engineering, mechanical,mechanics