Abstract:A safe and reliable task planning method is a prerequisite for the collaborative execution of ocean observation data collection tasks by multiple unmanned surface vessels (multi-USVs). Deep reinforcement learning (DRL) combines the powerful nonlinear function-fitting capabilities of deep neural networks with the decision-making and control abilities of reinforcement learning, providing a novel approach to solving the multi-USV task planning problem. However, when applied to the field of multi-USV task planning, it faces challenges, such as a vast exploration space, extended training times, and unstable training process. To this end, this article proposes a multi-USV task planning method based on improved DRL. The proposed method draws on the idea of a value decomposition network, breaking down the multi-USV task planning problem into two subproblems: 1) task allocation and 2) autonomous collision avoidance. Different state spaces, action spaces, and reward functions are designed for the various subproblems. Based on this, a deep neural network is used to map the state space of each subproblem to the action space of each USV, and the generated strategy of the deep neural network is assessed based on the corresponding reward function. This successfully integrates task allocation and path planning into a comprehensive task planning framework. Deep neural networks consist of the Actor networks and the Critic networks. During the training phase of the Critic network, different methods are used to train different Critic networks to improve the convergence speed of the algorithm. An improved temporal difference error method is specifically applied to train the Critic network for evaluating autonomous collision avoidance strategies, resulting in improving the autonomous collision avoidance ability of USVs. At the same time, to improve the efficiency of task allocation, hierarchical mechanisms and regional division mechanisms are introduced to construct subsystem task planning models, which further decompose the task planning problem. A combination of successor features and an improved temporal difference error method is specifically applied to train another Critic network for evaluating the subsystem's task allocation schemes and collaborative motion trajectories, aiming to enhance the allocation efficiency of the subsystems. Furthermore, transfer learning is employed to merge the subsystem task planning, using it as a constraint to direct the exploration and assessment of both the cluster task allocation schemes and the cluster collaborative motion trajectories. This enables rapid and accurate learning for task allocation within the multi-USV cluster. During the training phase of the Actor network, the introduction of the experience replay method and target network technique is employed to enhance the proximal policy optimization algorithm. This facilitates distributed joint training of the Actor network, thereby improving the accuracy of the algorithm. Simulation results validate the effectiveness and superiority of this method.

A Real-time Algorithm for USV Navigation Based on Deep Reinforcement Learning

Path planning of autonomous underwater vehicle in unknown environment based on improved deep reinforcement learning

Obstacle avoidance USV in multi-static obstacle environments based on a deep reinforcement learning approach

Local path planning for unmanned surface vehicle based on spatial and temporal sensing-enhanced deep Q-network

Sim-to-Real: Mapless Navigation for USVs Using Deep Reinforcement Learning

Path Planning of Unmanned Underwater Vehicles Based on Deep Reinforcement Learning Algorithm

Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle

Real-time Planning and Collision Avoidance Control Method Based on Deep Reinforcement Learning

Target Search Control Of Auv In Underwater Environment With Deep Reinforcement Learning

Dynamic Obstacle Avoidance for USVs Using Cross-Domain Deep Reinforcement Learning and Neural Network Model Predictive Controller

AUV Obstacle Avoidance Planning Based on Deep Reinforcement Learning

USV Formation Navigation Decision-Making Through Hybrid Deep Reinforcement Learning Using Self-Attention Mechanism

A Greedy Navigation and Subtle Obstacle Avoidance Algorithm for USV Using Reinforcement Learning

DRL-based Path Planning and Obstacle Avoidance of Autonomous Underwater Vehicle

Path Planning based on Deep Reinforcement Learning for Autonomous Underwater Vehicles under Ocean Current Disturbance

Path Following Optimization for an Underactuated USV Using Smoothly-Convergent Deep Reinforcement Learning

A path planning approach for unmanned surface vehicles based on dynamic and fast Q-learning

A Path Planning Method Based on Deep Reinforcement Learning for AUV in Complex Marine Environment

Path Planning of Unmanned Surface Vehicle Based on Improved Q-Learning Algorithm

Multi-USV Task Planning Method Based on Improved Deep Reinforcement Learning

Multi-USV System Cooperative Underwater Target Search Based on Reinforcement Learning and Probability Map