DoCRL: Double Critic Deep Reinforcement Learning for Mapless Navigation of a Hybrid Aerial Underwater Vehicle with Medium Transition

Ricardo B. Grando,Junior C. de Jesus,Victor A. Kich,Alisson H. Kolling,Rodrigo S. Guerra,Paulo L. J. Drews-Jr
2023-08-19
Abstract:Deep Reinforcement Learning (Deep-RL) techniques for motion control have been continuously used to deal with decision-making problems for a wide variety of robots. Previous works showed that Deep-RL can be applied to perform mapless navigation, including the medium transition of Hybrid Unmanned Aerial Underwater Vehicles (HUAUVs). These are robots that can operate in both air and water media, with future potential for rescue tasks in robotics. This paper presents new approaches based on the state-of-the-art Double Critic Actor-Critic algorithms to address the navigation and medium transition problems for a HUAUV. We show that double-critic Deep-RL with Recurrent Neural Networks using range data and relative localization solely improves the navigation performance of HUAUVs. Our DoCRL approaches achieved better navigation and transitioning capability, outperforming previous approaches.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use Deep - Reinforcement Learning (Deep - RL) technology to achieve autonomous navigation and transition between air and water media for Hybrid Unmanned Aerial - Underwater Vehicles (HUAUVs) in map - free navigation. Specifically, the paper focuses on: 1. **Map - free navigation**: How can HUAUVs effectively navigate to the target location without a pre - constructed map? 2. **Media transition**: How can HUAUVs smoothly transition between air and water and maintain good navigation performance in both media? 3. **Obstacle avoidance**: How can HUAUVs avoid colliding with obstacles during navigation? To address these challenges, the paper proposes a deep reinforcement learning method based on the Double Critic structure, namely DoCRL (Double Critic Deep Reinforcement Learning), including two methods, deterministic and stochastic (DoCRL - D and DoCRL - S). These methods use Long Short - Term Memory Networks (LSTM) to process time - series data and improve the navigation performance of HUAUVs through range data and relative positioning data. ### Main contributions 1. **Propose two deep reinforcement learning methods based on the Double Critic structure**: DoCRL - D (based on the TD3 algorithm) and DoCRL - S (based on the SAC algorithm), which can perform well in map - free navigation tasks. 2. **Use the LSTM architecture**: Compared with the traditional Multi - Layer Perceptron (MLP) architecture, LSTM performs better in overall performance. 3. **Demonstrate the robustness of HUAUVs in a simulated environment**: HUAUVs can successfully navigate in air - water rescue tasks in a simulated environment, complete media transitions, and avoid collisions. ### Experimental results The experimental results show that the DoCRL method performs well in both air - water and water - air navigation tasks. Especially in the air - water navigation task, the success rate of the DoCRL - D method is 100%, and the navigation time is very stable. In addition, the DoCRL method also shows good robustness and generalization ability in complex environments. ### Conclusion The paper verifies the effectiveness of the proposed DoCRL methods in map - free navigation, obstacle avoidance, and media transition through experiments in a physically realistic simulated environment. These methods are not only superior in performance to existing deep reinforcement learning methods with a single - critic structure but also superior to traditional Behavior - Based Algorithms (BBA). Future research will further explore the potential of HUAUVs in practical applications, especially in robotic rescue tasks.