Abstract:A safe and reliable task planning method is a prerequisite for the collaborative execution of ocean observation data collection tasks by multiple unmanned surface vessels (multi-USVs). Deep reinforcement learning (DRL) combines the powerful nonlinear function-fitting capabilities of deep neural networks with the decision-making and control abilities of reinforcement learning, providing a novel approach to solving the multi-USV task planning problem. However, when applied to the field of multi-USV task planning, it faces challenges, such as a vast exploration space, extended training times, and unstable training process. To this end, this article proposes a multi-USV task planning method based on improved DRL. The proposed method draws on the idea of a value decomposition network, breaking down the multi-USV task planning problem into two subproblems: 1) task allocation and 2) autonomous collision avoidance. Different state spaces, action spaces, and reward functions are designed for the various subproblems. Based on this, a deep neural network is used to map the state space of each subproblem to the action space of each USV, and the generated strategy of the deep neural network is assessed based on the corresponding reward function. This successfully integrates task allocation and path planning into a comprehensive task planning framework. Deep neural networks consist of the Actor networks and the Critic networks. During the training phase of the Critic network, different methods are used to train different Critic networks to improve the convergence speed of the algorithm. An improved temporal difference error method is specifically applied to train the Critic network for evaluating autonomous collision avoidance strategies, resulting in improving the autonomous collision avoidance ability of USVs. At the same time, to improve the efficiency of task allocation, hierarchical mechanisms and regional division mechanisms are introduced to construct subsystem task planning models, which further decompose the task planning problem. A combination of successor features and an improved temporal difference error method is specifically applied to train another Critic network for evaluating the subsystem's task allocation schemes and collaborative motion trajectories, aiming to enhance the allocation efficiency of the subsystems. Furthermore, transfer learning is employed to merge the subsystem task planning, using it as a constraint to direct the exploration and assessment of both the cluster task allocation schemes and the cluster collaborative motion trajectories. This enables rapid and accurate learning for task allocation within the multi-USV cluster. During the training phase of the Actor network, the introduction of the experience replay method and target network technique is employed to enhance the proximal policy optimization algorithm. This facilitates distributed joint training of the Actor network, thereby improving the accuracy of the algorithm. Simulation results validate the effectiveness and superiority of this method.

COLREGs-Based Path Planning for USVs Using the Deep Reinforcement Learning Strategy

Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning

COLREGs-abiding hybrid collision avoidance algorithm based on deep reinforcement learning for USVs

Real-time Planning and Collision Avoidance Control Method Based on Deep Reinforcement Learning

Collision-avoidance under COLREGS for Unmanned Surface Vehicles Via Deep Reinforcement Learning

A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field

Online path planning of an unmanned surface vehicle for real-time collision avoidance

A Novel Reinforcement Learning Collision Avoidance Algorithm for USVs Based on Maneuvering Characteristics and COLREGs

Collision Avoidance Path Planning Algorithm Research and Application of Medium-Sized USV Based on COLREGS

Intelligent Collision Avoidance Algorithms for USVs Via Deep Reinforcement Learning under COLREGs

Collision Avoidance for Unmanned Surface Vehicles Based on COLREGS

Research on Real-Time Collision Avoidance and Path Planning of Usvs in Multi-Obstacle Ships Environment

Unmanned Surface Vehicle Collision Avoidance Path Planning in Restricted Waters Using Multi-Objective Optimisation Complying with COLREGs

Motion Planning of USV Based on Marine Rules

COLREGs-Compliant Unmanned Surface Vehicles Collision Avoidance Based on Multi-Objective Genetic Algorithm

Collision Avoidance and Path Point Tracking Control for Underactuated Unmanned Surface Vehicles with Unknown Model Nonlinearity

Local Collision Avoidance Algorithm for a Unmanned Surface Vehicle Based on Steering Maneuver Considering COLREGs

COLREG-Compliant Collision Avoidance for Unmanned Surface Vehicle using Deep Reinforcement Learning

Multi-USV Formation Collision Avoidance via Deep Reinforcement Learning and COLREGs

Multi-USV Task Planning Method Based on Improved Deep Reinforcement Learning

A new real‐time path planning for USV based on dynamic artificial potential field in complex environments