Learning an End-To-End Policy for AUV Control Within Just Forty Minutes Using Parallel Simulation

Shuguang Chu,Dejun Li,Mingwei Lin
DOI: https://doi.org/10.1109/oceans51537.2024.10682328
2024-01-01
Abstract:Autonomous Underwater Vehicle (AUV) has become an essential tool for ocean environmental observation due to their cost-effective integration of various sensors. Deep Reinforcement Learning (DRL) offers a promising solution for complex underwater problems, yet encounters limitations in sample efficiency. This paper presents a distributed parallel framework utilizing multi-threading and parallel simulations within Gazebo to overcome the constraint, leading to a significant acceleration in the training process. The framework can be used as a general training simulation platform for training policy to control underwater vehicle, helping improving autonomous ability of AUV facing complex underwater environment through state-of-the-art DRL algorithm. To verify its capability, a challenging docking control policy for AUV is trained using the Proximal Policy Optimization (PPO) algorithm. The total time consumption in training process is less than forty minutes, achieving remarkable speed improvements compared to existing methods. This framework holds the potential to expedite the development of AUV control algorithms and enhance the autonomy of AUV in complex environments.
What problem does this paper attempt to address?