A partially observable multi-ship collision avoidance decision-making model based on deep reinforcement learning

Kangjie Zheng,Xinyu Zhang,Chengbo Wang,Mingyang Zhang,Hao Cui
DOI: https://doi.org/10.1016/j.ocecoaman.2023.106689
2023-06-28
Abstract:Unmanned ships have drawn widespread attention for their potential to enhance navigational safety, minimize human errors, and improve shipping efficiency. Nevertheless, the complexity and uncertainty of mixed obstacle environments present significant challenges to developing unmanned ships, particularly in collision avoidance decision-making. This paper proposes a new model using the Partially Observable Markov Decision Process (POMDP) to construct a collision avoidance decision-making model in mixed obstacle environments for autonomous ships, which can address the environment's complexity and uncertainty and improve decision accuracy. An image-state observation method is proposed as images can provide more accurate, rich, and reliable information. A dense reward function is designed to address the issue of sparse rewards in fitting the algorithm. The Proximal Policy Optimization (PPO) algorithm is utilized for model training. Based on this, a route guidance method called the PPO for POMDP with guidelines under dense reward (G-IPOMDP-PPO) is proposed, which can improve training efficiency. Simulations are conducted in various mixed obstacle environments and compared with conventional algorithms. The results show that the proposed model can safely and efficiently make collision avoidance decisions in complex and uncertain environments. This research provides a new solution and theoretical foundation for developing autonomous ships and can be extended to achieving dynamic interactive collision avoidance in mixed obstacle environments.
water resources,oceanography
What problem does this paper attempt to address?