Data-Based Optimal Switching and Control with Admissibility Guaranteed ≪inline-Formula> ≪tex-Math Notation="latex">$q$</tex-Math> ≪/inline-Formula>-learning

Zhengrong Xiang,Pingchuan Li,Wencheng Zou,Choon Ki Ahn
DOI: https://doi.org/10.1109/tnnls.2024.3405739
IF: 14.255
2024-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:This article addresses the data-based optimal switching and control codesign for discrete-time nonlinear switched systems via a two-stage approximate dynamic programming (ADP) algorithm. Through offline policy improvement and policy evaluation, the proposed algorithm iteratively determines the optimal hybrid control policy using system input/output data. Moreover, a strict proof of the convergence is given for the two-stage ADP algorithm. Admissibility, an essential property of the hybrid control policy must be ensured for practical application. To this end, the properties of the hybrid control policies are analyzed and an admissibility criterion is obtained. To realize the proposed Q -learning algorithm, an actor-critic neural network (NN) structure that employs multiple NNs to approximate the Q -functions and control policies for different subsystems is adopted. By applying the proposed admissibility criterion, the obtained hybrid control policy is guaranteed to be admissible. Finally, two numerical simulations verify the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?