Optimal Control for Continuous-Time Unknown Nonlinear Affine Systems: A $\Mathcal{q}$-Learning Approach
Shuhang Yu,Huaguang Zhang,Zhongyang Ming,Jiayue Sun
DOI: https://doi.org/10.1109/tase.2023.3327264
IF: 6.636
2023-01-01
IEEE Transactions on Automation Science and Engineering
Abstract:In this paper, to tackle the optimal control problem, we propose a $\mathcal{Q}$ -Learning approach for continuous-time nonlinear systems without any dynamic information. Primarily, the Hamiltonian and optimum cost functions are utilized to articulate the $\mathcal{Q}$ -function of continuous-time affine systems. To reduce the dependence of algorithms on system information, a novel $\mathcal{Q}$ -Learning approach is derived to obtain optimal solutions of nonlinear continuous-time systems without requiring knowledge of either the drift information $p(x)$ or input gain $q(x)$ . To implement this approach, critic and actor neural networks can be iterated alternately using an integral reinforcement learning method to estimate the $\mathcal{Q}$ -function. Furthermore, all signals in closed-loop system are demonstrated to be ultimate uniform bounded (UUB). It is worth noting that there exist rare literatures focused on the optimal control problem of continuous-time nonlinear uncertain systems via the $\mathcal{Q}$ -Learning for actor/critic networks iteration. Finally, two simulations are used to confirm the effectiveness of the proposed algorithm. Note to Practitioners —Nonlinear continuous-time systems, being ubiquitous in engineering practice, are widely employed due to their versatility and effectiveness. Aiming at such systems, a $\mathcal{Q}$ -learning approach with optimal feature is proposed to strengthen control efficiency while reduce costs. However, it is well known that accurately capturing all the dynamic information of the system is a formidable task in practical operation. This defect inevitably weakens the feasibility of model-based control algorithms. Since the $\mathcal{Q}$ -learning algorithm presented in this paper does not require any dynamic knowledge of systems, it is promising enabler in enhancing the effectiveness and flexibility of engineering activities.