Adaptive Optimal Control of Nonlinear Systems with Multiple Time-scale Eligibility Traces

Jun Rao,Jingcheng Wang,Jiahui Xu,Shunyu Wu
DOI: https://doi.org/10.1109/CDC49753.2023.10383316
2023-01-01
Abstract:Adaptive dynamic programming (ADP) is one of the main methods to solve the optimal control problem of nonlinear systems. Eligibility traces are utilized in recent years to reduce the computing burden of the value function, but the existing fixed eligibility trace is difficult to ensure stable convergence especially when facing environmental changes and complex neural network structures. To solve the above issues, a novel off-policy algorithm, T-HDP(lambda) with Multiple Timescale Eligibility Traces (MET), is proposed. By utilizing MET, the new algorithm can adaptively accumulate gradients and include more gradient information, which guides the control faster in the optimal direction. T-step Truncated lambda-returns are utilized to solve the infinite-horizon optimal control problems, and a new importance sampling ratio is designed to correct the value function. Furthermore, the convergence and boundedness of the algorithm are proved. Based on the actor-critic network architecture, the optimal value function and policy are well approximated. Finally, compared with the original algorithm by a simulation example, the proposed algorithm has a faster convergence speed and lower variance.
What problem does this paper attempt to address?