Abstract:A safe and reliable task planning method is a prerequisite for the collaborative execution of ocean observation data collection tasks by multiple unmanned surface vessels (multi-USVs). Deep reinforcement learning (DRL) combines the powerful nonlinear function-fitting capabilities of deep neural networks with the decision-making and control abilities of reinforcement learning, providing a novel approach to solving the multi-USV task planning problem. However, when applied to the field of multi-USV task planning, it faces challenges, such as a vast exploration space, extended training times, and unstable training process. To this end, this article proposes a multi-USV task planning method based on improved DRL. The proposed method draws on the idea of a value decomposition network, breaking down the multi-USV task planning problem into two subproblems: 1) task allocation and 2) autonomous collision avoidance. Different state spaces, action spaces, and reward functions are designed for the various subproblems. Based on this, a deep neural network is used to map the state space of each subproblem to the action space of each USV, and the generated strategy of the deep neural network is assessed based on the corresponding reward function. This successfully integrates task allocation and path planning into a comprehensive task planning framework. Deep neural networks consist of the Actor networks and the Critic networks. During the training phase of the Critic network, different methods are used to train different Critic networks to improve the convergence speed of the algorithm. An improved temporal difference error method is specifically applied to train the Critic network for evaluating autonomous collision avoidance strategies, resulting in improving the autonomous collision avoidance ability of USVs. At the same time, to improve the efficiency of task allocation, hierarchical mechanisms and regional division mechanisms are introduced to construct subsystem task planning models, which further decompose the task planning problem. A combination of successor features and an improved temporal difference error method is specifically applied to train another Critic network for evaluating the subsystem's task allocation schemes and collaborative motion trajectories, aiming to enhance the allocation efficiency of the subsystems. Furthermore, transfer learning is employed to merge the subsystem task planning, using it as a constraint to direct the exploration and assessment of both the cluster task allocation schemes and the cluster collaborative motion trajectories. This enables rapid and accurate learning for task allocation within the multi-USV cluster. During the training phase of the Actor network, the introduction of the experience replay method and target network technique is employed to enhance the proximal policy optimization algorithm. This facilitates distributed joint training of the Actor network, thereby improving the accuracy of the algorithm. Simulation results validate the effectiveness and superiority of this method.

Dynamic route planning method based on deep reinforcement learning and velocity obstacle

A Hybrid Path Planning Algorithm for Unmanned Surface Vehicles in Complex Environment with Dynamic Obstacles.

Dynamic Route Planning for a USV-UAV Multi-Robot System in the Rendezvous Task with Obstacles

A Hybrid Intelligent Path Planning Method Based on Improved Dyna-H Architecture for Unmanned Surface Vessel

A path planning approach for unmanned surface vehicles based on dynamic and fast Q-learning

Dynamic Path Planning Algorithm for Unmanned Surface Vehicle under Island-Reef Environment.

A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field

Proximal policy optimization with reciprocal velocity obstacle based collision avoidance path planning for multi-unmanned surface vehicles

Real-time Planning and Collision Avoidance Control Method Based on Deep Reinforcement Learning

Using Deep Reinforcement Learning Methods for Autonomous Vessels in 2D Environments

Deep Reinforcement Learning-Based Path Planning of Underactuated Surface Vessels

Path Planning Method of Unmanned Surface Vessel Based on Strategy Integration

Path Planning Algorithm for Unmanned Surface Vessel Based on Multiobjective Reinforcement Learning

Local path planning for unmanned surface vehicle based on spatial and temporal sensing-enhanced deep Q-network

Achieving optimal-dynamic path planning for unmanned surface vehicles: A rational multi-objective approach and a sensory-vector re-planner

Global path planning algorithm based on double DQN for multi-tasks amphibious unmanned surface vehicle

Multi-USV Task Planning Method Based on Improved Deep Reinforcement Learning

Path Planning of Unmanned Surface Vehicle Based on Improved Q-Learning Algorithm

Dynamic Replanning Algorithm Of Local Trajectory For Unmanned Surface Vehicle

DRL-based Path Planning and Obstacle Avoidance of Autonomous Underwater Vehicle

A Path Planning Method Based on Deep Reinforcement Learning for AUV in Complex Marine Environment