Abstract:This paper delves into the potential of DU-VIO, a dehazing-aided hybrid multi-rate multi-modal Visual-Inertial Odometry (VIO) estimation framework, designed to thrive in the challenging realm of extreme underwater environments. The cutting-edge DU-VIO framework is incorporating a GAN-based pre-processing module and a hybrid CNN-LSTM module for precise pose estimation, using visibility-enhanced underwater images and raw IMU data. Accurate pose estimation is paramount for various underwater robotics and exploration applications. However, underwater visibility is often compromised by suspended particles and attenuation effects, rendering visual-inertial pose estimation a formidable challenge. DU-VIO aims to overcome these limitations by effectively removing visual disturbances from raw image data, enhancing the quality of image features used for pose estimation. We demonstrate the effectiveness of DU-VIO by calculating RMSE scores for translation and rotation vectors in comparison to their reference values. These scores are then compared to those of a base model using a modified AQUALOC Dataset. This study's significance lies in its potential to revolutionize underwater robotics and exploration. DU-VIO offers a robust solution to the persistent challenge of underwater visibility, significantly improving the accuracy of pose estimation. This research contributes valuable insights and tools for advancing underwater technology, with far-reaching implications for scientific research, environmental monitoring, and industrial applications.

What problem does this paper attempt to address?

This paper attempts to solve the problem of the impact of reduced visibility due to suspended particles and attenuation effects on the accuracy of visual - inertial odometry (VIO) pose estimation in extreme underwater environments. Specifically: 1. **Visual Interference in Extreme Underwater Environments**: Problems such as low - texture, distortion, and turbid images commonly found in underwater environments seriously affect the accuracy of vision - based pose estimation. 2. **Limitations of Traditional Methods**: Traditional geometry - based VIO methods are unreliable in the case of dynamic lighting, feature - less environments, and unclear images, and it is difficult for them to adapt to complex real - world environments. 3. **The Need for Multi - Modal Data Fusion**: A method that can handle multi - rate multi - modal data (such as camera images and IMU data) and effectively fuse these data is required to improve the accuracy of pose estimation. To solve the above problems, the paper proposes the DU - VIO framework, whose main features include: - **Defogging Module**: Use a generative adversarial network (GAN) pre - processing module to remove visual interference in the original image and enhance image quality. - **Hybrid CNN - LSTM Architecture**: Combine a convolutional neural network (CNN) and a long - short - term memory network (LSTM) to extract features from the defogged image and the original IMU data for 6 - degree - of - freedom (6DoF) pose estimation. - **Multi - Modal Data Fusion**: Combine visual features and inertial features through a multi - modal fusion module to further improve the accuracy of pose estimation. Through these innovations, the DU - VIO framework aims to significantly improve the accuracy of pose estimation in extreme underwater environments, thereby providing more reliable technical support for underwater robots and exploration applications. ### Summary of Mathematical Formulas The evaluation metric mentioned in the paper is mainly the root - mean - square error (RMSE), which is used to quantify the translational and rotational errors of pose estimation. The specific formula is as follows: \[ L_{\text{pose}}=\frac{1}{T - 1}\sum_{t = 1}^{T - 1}(\|\mathbf{v}_t-\hat{\mathbf{v}}_t\|_2^2+\alpha\|\boldsymbol{\phi}_t-\hat{\boldsymbol{\phi}}_t\|_2^2) \] where: - \(T\) is the sequence length, - \(\mathbf{v}_t\) and \(\boldsymbol{\phi}_t\) are the true values of translation and rotation respectively, - \(\hat{\mathbf{v}}_t\) and \(\hat{\boldsymbol{\phi}}_t\) are the predicted values of translation and rotation respectively, - \(\alpha\) is a weighting parameter that balances the translational and rotational losses. Through these improvement and evaluation methods, the DU - VIO framework demonstrates its superior performance in extreme underwater environments.

Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain

UniVIO: Unified Direct and Feature-Based Underwater Stereo Visual-Inertial Odometry

Real-time pose estimation for an underwater object combined with deep learning and prior information

SM/VIO: Robust Underwater State Estimation Switching Between Model-based and Visual Inertial Odometry

DeepURL: Deep Pose Estimation Framework for Underwater Relative Localization

Comparative study on real-time pose estimation of vision-based unmanned underwater vehicles

Pose Estimation from Camera Images for Underwater Inspection

Unsupervised Deep Persistent Monocular Visual Odometry and Depth Estimation in Extreme Environments

Vision-Based Autonomous Navigation for Unmanned Surface Vessel in Extreme Marine Conditions

UDepth: Fast Monocular Depth Estimation for Visually-guided Underwater Robots

Real-Time Monocular Visual Odometry for Turbid and Dynamic Underwater Environments

Deep Learning for Enhanced Marine Vision: Object Detection in Underwater Environments

Real-time Image Enhancement for Vision-based Autonomous Underwater Vehicle Navigation in Murky Waters

A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement

Multi-Sensor Fusion Self-Supervised Deep Odometry and Depth Estimation

UnDIVE: Generalized Underwater Video Enhancement Using Generative Priors

SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation

A vision based system for underwater docking

Model-Based Underwater 6D Pose Estimation from RGB

Unsupervised Multiple Representation Disentanglement Framework for Improved Underwater Visual Perception