Online policy iteration ADP-based attitude-tracking control for hypersonic vehicles

Xiao Han,Zongzhun Zheng,Lei Liu,Bo Wang,Zhongtao Cheng,Huijin Fan,Yongji Wang
DOI: https://doi.org/10.1016/j.ast.2020.106233
IF: 5.457
2020-11-01
Aerospace Science and Technology
Abstract:<p>An online adaptive dynamic programming (ADP) attitude-tracking controller based on policy iteration is proposed, aiming to approach the optimal control of hypersonic vehicles (HVs). The Bellman equation, known as the principal recursive dynamic programming formula, is provided to obtain the controller. In particular, the control action is generated by the ADP controller to track the attitude trajectory. In order to approach optimal control in the uncertain nonlinear HVs system, we use policy iteration to approximate the Bellman equation and build an actor-predictor-critic framework, in which the action network, state estimator and critic network are adopted to implement the policy iteration. Meanwhile, an offline learning method is provided to approach the initial value of iterative computations and improve the efficiency of online learning. The comparative simulations demonstrate the good performance of PIADP with aerodynamic parameter perturbations and random disturbances.</p>
engineering, aerospace
What problem does this paper attempt to address?
The paper mainly addresses the attitude tracking control problem of Hypersonic Vehicles (HVs). Specifically, the research aims to design an online controller based on Adaptive Dynamic Programming (ADP) to achieve optimal control of HVs' attitude trajectory. To achieve this goal, the researchers employ the Policy Iteration (PI) method to approximate the Bellman Equation and implement policy iteration by constructing an Actor-Predictor-Critic framework that includes an action network, state estimator, and evaluation network. This framework enables the controller to approximate optimal control in uncertain and nonlinear HVs systems. Additionally, the paper proposes an offline learning method to approximate the initial values of iterative computations, thereby improving the efficiency of online learning. Simulations verify the good performance of the proposed PIADP (Policy Iteration Adaptive Dynamic Programming) controller in the presence of aerodynamic parameter disturbances and random interferences. In summary, the main contributions of the paper are: 1. Proposing an ADP-based controller for the HVs attitude control system to approach optimal control. This controller uses policy iteration to approximate the Bellman Equation and designs a new Actor-Predictor-Critic framework to implement policy iteration. 2. Analyzing the properties of policy iteration, demonstrating that policy iteration can converge online to optimal control. 3. Introducing how neural networks approximate the Bellman Equation through Lyapunov stability theory and proposing offline and online learning algorithm schemes.