A Novel Deep Reinforcement Learning for POMDP-based Autonomous Ship Collision Decision-Making

Xinyu Zhang,Kangjie Zheng,Chengbo Wang,Jihong Chen,Huaiyuan Qi
DOI: https://doi.org/10.1007/s00521-023-08908-z
2023-01-01
Neural Computing and Applications
Abstract:To address the challenge of partially observable environment states in multi-ship collision avoidance decision-making, a novel collision avoidance decision model is developed based on Partially Observable Markov Decision Processes (POMDP), integrating both static and dynamic obstacles. A new reward mechanism is designed to overcome the problem of sparse rewards, incorporating the international regulations for preventing collisions at sea (COLREGs) into the reward function. The Proximal Policy Optimization (PPO) algorithm is employed to train the model, accompanied by a suitable network structure. Furthermore, image conversion and scaling operations are applied to the network’s architecture to reduce the dimensionality of the state space, thereby enhancing the training speed and fitting performance of the algorithm. Subsequently, a simulation environment is created using the Gym platform, incorporating multiple static and dynamic obstacles, and a series of experiments containing classic encounter scenarios are designed to validate the model. The evaluation metrics, such as the total cumulative reward and training steps, demonstrate that the proposed algorithm is capable of making accurate decisions in an environment with combined static and dynamic obstacles. Through several ablation experiments and analysis of total run-time, it is shown that the proposed reward mechanism effectively addresses the issue of sparse rewards, while the algorithm significantly improves the fitting speed and expedites the training process. The proposed algorithm’s ability to address the problem of partially observable environment states in multi-ship collision avoidance decision-making is confirmed through a comparison with other classical deep reinforcement learning (DRL) algorithms.
What problem does this paper attempt to address?