Realizing Midcourse Penetration With Deep Reinforcement Learning

Liang Jiang,Ying Nan,Zhi-Han Li
DOI: https://doi.org/10.1109/access.2021.3091605
IF: 3.9
2021-01-01
IEEE Access
Abstract:A midcourse maneuver controller is obtained using deep reinforcement learning to maintain the survivability of a ballistic missile. First, the midcourse is abstracted as a Markov decision process (MDP) with an unknown system state equation. Then, a controller formed by the Dueling Double Deep Q (D3Q) neural network is used to approximate the state-action value function of the MDP. In order to make the controller's intelligence improved by deep reinforcement learning, the state space, action space, and instant reward function of the MDP are customized. The controller uses a real-time situation as input and outputs the ignition states of pulse motors. Offline training shows that deep reinforcement learning can achieve the optimal strategy's convergence after approximately 65 hours. Online tests demonstrate the controller's ability to avoid an interceptor intelligently and to account for an entry error. In scenarios with multiple random factors, the controller achieved a penetration probability of 100% and a mean re-entry error of less than 5000 m.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?