Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs

WANG Lixin,Sizhuang ZHENG,PIAO Haiyin,LU Changqian,YUE Ting,LIU Hailiang,Lixin WANG,Haiyin PIAO,Changqian LU,Ting YUE,Hailiang LIU
DOI: https://doi.org/10.1016/j.cja.2024.03.025
IF: 5.7
2024-03-22
Chinese Journal of Aeronautics
Abstract:Reinforcement Learning (RL) algorithms enhance intelligence of air combat Autonomous Maneuver Decision (AMD) policy, but they may underperform in target combat environments with disturbances. To enhance the robustness of the AMD strategy learned by RL, this study proposes a Tube-based Robust RL (TRRL) method. First, this study introduces a tube to describe reachable trajectories under disturbances, formulates a method for calculating tubes based on sum-of-squares programming, and proposes the TRRL algorithm that enhances robustness by utilizing tube size as a quantitative indicator. Second, this study introduces offline techniques for regressing the tube size function and establishing a tube library before policy learning, aiming to eliminate complex online tube solving and reduce the computational burden during training. Furthermore, an analysis of the tube library demonstrates that the mitigated AMD strategy achieves greater robustness, as smaller tube sizes correspond to more cautious actions. This finding highlights that TRRL enhances robustness by promoting a conservative policy. To effectively balance aggressiveness and robustness, the proposed TRRL algorithm introduces a "laziness factor" as a weight of robustness. Finally, combat simulations in an environment with disturbances confirm that the AMD policy learned by the TRRL algorithm exhibits superior air combat performance compared to selected robust RL baselines.
engineering, aerospace
What problem does this paper attempt to address?