Enhanced LSTM‐DQN algorithm for a two‐player zero‐sum game in three‐dimensional space

Bo Lu,Le Ru,Maolong Lv,Shiguang Hu,Hongguo Zhang,Zilong Zhao
DOI: https://doi.org/10.1049/cth2.12677
IF: 2.67
2024-05-16
IET Control Theory and Applications
Abstract:The LSTM‐DQN‐HER algorithm is proposed by modelling the MDP and POMDP of a two‐player zero‐sum game problem in three‐dimensional space, and the effectiveness of the proposed algorithm in solving the three‐dimensional two‐player zero‐sum game problem is verified by training and adversarial simulation of the Agent. To tackle the challenges presented by the two‐player zero sum game (TZSG) in three‐dimensional space, this study introduces an enhanced deep Q‐learning (DQN) algorithm that utilizes long short term memory (LSTM) network. The primary objective of this algorithm is to enhance the temporal correlation of the TZSG in three‐dimensional space. Additionally, it incorporates the hindsight experience replay (HER) mechanism to improve the learning efficiency of the network and mitigate the issue of the "sparse reward" that arises from prolonged training of intelligence in solving the TZSG in the three‐dimensional. Furthermore, this method enhances the convergence and stability of the overall solution.An intelligent training environment centred around an airborne agent and its mutual pursuit interaction scenario was designed to proposed approach's effectiveness. The algorithm training and comparison results show that the LSTM‐DQN‐HER algorithm outperforms similar algorithm in solving the TZSG in three‐dimensional space. In conclusion, this paper presents an improved DQN algorithm based on LSTM and incorporates the HER mechanism to address the challenges posed by the TZSG in three‐dimensional space. The proposed algorithm enhances the solution's temporal correlation, learning efficiency, convergence, and stability. The simulation results confirm its superior performance in solving the TZSG in three‐dimensional space.
automation & control systems,engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?