Autonomous spacecraft collision avoidance with a variable number of space debris based on safe reinforcement learning

Chaoxu Mu,Shuo Liu,Ming Lu,Zhaoyang Liu,Lei Cui,Ke Wang
DOI: https://doi.org/10.1016/j.ast.2024.109131
IF: 5.457
2024-04-20
Aerospace Science and Technology
Abstract:The avoidance of multiple space debris collisions by autonomous spacecraft has garnered significant interests worldwide. Applying deep reinforcement learning (DRL) to autonomous spacecraft collision avoidance problems is still difficult because of limitations on constraint satisfaction and environment state perception, even if DRL is a suitable model-free and data-driven framework. In this research, a state-of-the-art penalized proximal policy optimization (P3O) method is applied to address the spacecraft's autonomous collision avoidance problem, which is formalized as a constrained Markov decision process (CMDP). In contrast with traditional DRL methods, P3O promises to satisfy multiple constraints in actual spaceship operations while also facilitating efficient learning in multi-dimensional, continuous state and action spaces. The scalability of the P3O algorithm is enhanced by combining the feature extraction capabilities and variable-length input capabilities of the long short-term memory (LSTM) networks, enabling the P3O to adapt to a variable number of space debris without the need for network retraining. The performance of the proposed method is compared with other five methods through simulation cases, which verifies the superior performance of the proposed method in terms of scalability, energy consumption, collision probability and constraint satisfaction.
engineering, aerospace
What problem does this paper attempt to address?