Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain

Vidya Sudevan,Fakhreddine Zayer,Taimur Hassan,Sajid Javed,Hamad Karki,Giulia De Masi,Jorge Dias
2024-11-21
Abstract:This paper delves into the potential of DU-VIO, a dehazing-aided hybrid multi-rate multi-modal Visual-Inertial Odometry (VIO) estimation framework, designed to thrive in the challenging realm of extreme underwater environments. The cutting-edge DU-VIO framework is incorporating a GAN-based pre-processing module and a hybrid CNN-LSTM module for precise pose estimation, using visibility-enhanced underwater images and raw IMU data. Accurate pose estimation is paramount for various underwater robotics and exploration applications. However, underwater visibility is often compromised by suspended particles and attenuation effects, rendering visual-inertial pose estimation a formidable challenge. DU-VIO aims to overcome these limitations by effectively removing visual disturbances from raw image data, enhancing the quality of image features used for pose estimation. We demonstrate the effectiveness of DU-VIO by calculating RMSE scores for translation and rotation vectors in comparison to their reference values. These scores are then compared to those of a base model using a modified AQUALOC Dataset. This study's significance lies in its potential to revolutionize underwater robotics and exploration. DU-VIO offers a robust solution to the persistent challenge of underwater visibility, significantly improving the accuracy of pose estimation. This research contributes valuable insights and tools for advancing underwater technology, with far-reaching implications for scientific research, environmental monitoring, and industrial applications.
Robotics
What problem does this paper attempt to address?
This paper attempts to solve the problem of the impact of reduced visibility due to suspended particles and attenuation effects on the accuracy of visual - inertial odometry (VIO) pose estimation in extreme underwater environments. Specifically: 1. **Visual Interference in Extreme Underwater Environments**: Problems such as low - texture, distortion, and turbid images commonly found in underwater environments seriously affect the accuracy of vision - based pose estimation. 2. **Limitations of Traditional Methods**: Traditional geometry - based VIO methods are unreliable in the case of dynamic lighting, feature - less environments, and unclear images, and it is difficult for them to adapt to complex real - world environments. 3. **The Need for Multi - Modal Data Fusion**: A method that can handle multi - rate multi - modal data (such as camera images and IMU data) and effectively fuse these data is required to improve the accuracy of pose estimation. To solve the above problems, the paper proposes the DU - VIO framework, whose main features include: - **Defogging Module**: Use a generative adversarial network (GAN) pre - processing module to remove visual interference in the original image and enhance image quality. - **Hybrid CNN - LSTM Architecture**: Combine a convolutional neural network (CNN) and a long - short - term memory network (LSTM) to extract features from the defogged image and the original IMU data for 6 - degree - of - freedom (6DoF) pose estimation. - **Multi - Modal Data Fusion**: Combine visual features and inertial features through a multi - modal fusion module to further improve the accuracy of pose estimation. Through these innovations, the DU - VIO framework aims to significantly improve the accuracy of pose estimation in extreme underwater environments, thereby providing more reliable technical support for underwater robots and exploration applications. ### Summary of Mathematical Formulas The evaluation metric mentioned in the paper is mainly the root - mean - square error (RMSE), which is used to quantify the translational and rotational errors of pose estimation. The specific formula is as follows: \[ L_{\text{pose}}=\frac{1}{T - 1}\sum_{t = 1}^{T - 1}(\|\mathbf{v}_t-\hat{\mathbf{v}}_t\|_2^2+\alpha\|\boldsymbol{\phi}_t-\hat{\boldsymbol{\phi}}_t\|_2^2) \] where: - \(T\) is the sequence length, - \(\mathbf{v}_t\) and \(\boldsymbol{\phi}_t\) are the true values of translation and rotation respectively, - \(\hat{\mathbf{v}}_t\) and \(\hat{\boldsymbol{\phi}}_t\) are the predicted values of translation and rotation respectively, - \(\alpha\) is a weighting parameter that balances the translational and rotational losses. Through these improvement and evaluation methods, the DU - VIO framework demonstrates its superior performance in extreme underwater environments.