Abstract:This paper studies an orbital pursuit-evasion game involving multiple cooperative pursers and a non-cooperative evader, by applying multi-agent reinforcement learning method. In particular, challenges of complex orbit dynamics models and effective coordination among pursuers are considered, which are generally difficult to solve using traditional methods with on-board computers. First, the simulation scenario and dynamics model based on the orbital boundary-free pursuit-evasion game are established, and the criteria for successful pursuit and escape are designed. Then, an information model of a single satellite is built based on local observations, according to the limited observation ability. The state and action spaces are designed, by applying the Markov Decision Process framework of the multi-agent proximal policy optimization algorithm. Finally, adopting the curriculum learning method, a series of tasks with varying degrees of difficulty are designed, to ensure the smooth evolution of the strategy. Simulation shows that the pursuer satellites can pursuit the evader cooperatively with a high success rate. Moreover, four typical gaming strategies have been observed and analyzed: encirclement, pursuit-evasion, interception and latency.

Satellite Swarm Orbital Pursuit-Evasion Game Based on Multi-agent Proximal Policy Optimization Algorithm