Abstract:This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses sub-greedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based Deterministic Policy Gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. And the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.

Neural-network-based Deterministic Policy Gradient for Depth Control of AUVs

Learning an End-To-End Policy for AUV Control Within Just Forty Minutes Using Parallel Simulation

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

Depth Control of Model-Free AUVs Via Reinforcement Learning

Neural Network Model-Based Reinforcement Learning Control for AUV 3-D Path Following

AUV path following controlled by modified Deep Deterministic Policy Gradient

A Fast Adaptive AUV Control Policy Based on Progressive Networks with Context Information

Reinforcement Learning Based Obstacle Avoidance for AUV Swarm in Dynamic Ocean Environment

Safe deep reinforcement learning-based adaptive control for USV interception mission

AUV Path Following Control using Deep Reinforcement Learning Under the Influence of Ocean Currents.

Target tracking strategy using deep deterministic policy gradient

Path Following for Autonomous Ground Vehicle Using DDPG Algorithm: A Reinforcement Learning Approach

Intelligent AUV Surfacing Control in Network Attack Scenario

Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Deep Reinforcement Learning

Single degree of freedom control based on deep reinforcement learning for underwater unmanned vehicle

High-Level Tracking of Autonomous Underwater Vehicles Based on Pseudo Averaged Q-Learning.

Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning

Self-optimizing Dynamic Position Control of UUV in the Underwater Transportation

UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient

An Automatic Driving Control Method Based on Deep Deterministic Policy Gradient

Sim-to-Real Transfer of Adaptive Control Parameters for AUV Stabilization under Current Disturbance