Reinforcement Learning with Adjustable Convergence Rate for Data-Based Nonlinear Control

Guohan Tang,Ding Wang,Xin Li,Jin Ren,Nan Liu
DOI: https://doi.org/10.1109/ccdc62350.2024.10587338
2024-01-01
Abstract:In this paper, a value-iteration-based off-policy Q-learning algorithm is developed. The proposed algorithm solves the optimal regulation problem of nonlinear systems with unknown dynamics. Under the off-policy mechanism, the algorithm utilizes the behavioral policy for full exploration, which is beneficial to avoid the target policy from falling into the local optimal solution. In addition, a relaxation factor is introduced to adjust the convergence rate of the cost function sequence. To implement the algorithm, the critic network and the action network are used to approximate the optimal Q-function and the optimal control policy, respectively. Finally, a simulation example is presented to demonstrate the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?