Continuous Control for Autonomous Underwater Vehicle Path Following Using Deep Interactive Reinforcement Learning

Qilei Zhang,Chunxi Cheng,Zheng Fang,Dong Jiang,Bo He,Guangliang Li
DOI: https://doi.org/10.1109/mlcr57210.2022.00013
2022-01-01
Abstract:With the increasing demand for ocean exploration, higher requirements on both autonomy and intelligence have been put forward on the development of Autonomous Underwater Vehicle (AUV). To this end, deep reinforcement learning methods have started being used to improve AUV's autonomy and intelligence in recent years. However, low learning efficiency and high learning cost of traditional deep reinforcement learning prevent from applying them to physical AUV systems in real underwater environments. Therefore, this paper proposed a deep interactive reinforcement learning method based on the Deep Deterministic Policy Gradient (DDPG) algorithm for continuous motion control of AUV path following. The highlight of our proposed method is the design of a new reward allocator. Specifically, different from current deep interactive reinforcement learning methods, we allow the human trainer to provide a preferred action based on the evaluation on AUV's current situation. Then, the reward allocator is used to assign rewards indirectly based on the preferred action to deal with the high frequency of continuous action changes of AUV. The proposed method was tested in a sinusoids curve following tasks in the Gazebo simulation platform with an AUV simulator of our lab. The experimental results and analysis show that AUV path following with our proposed method can learn a more stable policy about 100 episodes faster than learning from only environmental rewards or only human rewards.
What problem does this paper attempt to address?