Abstract:Self-supervised monocular visual odometry has a crucial advantage of not depending on labels and has shown significant performance in autonomous driving and robotics. However, recent methods suffer from limited feature representations as they depend on coarse semantic masks to handle dynamic objects, resulting in diminished accuracy in dynamic environments. In contrast to these coarse-grained methods, we present Fine-MVO, a novel self-supervised monocular visual odometry that aims to address dynamic objects using implicit fine-grained feature representations, thus achieving excellent accuracy and robustness in dynamic environments. First, Fine-MVO provides an efficient cross-feature augmentation module and a novel loss weight balance strategy to effectively leverage fine-grained features with implicit semantic information, leading to a great improvement in the depth estimation accuracy, especially on object boundaries in the scenes. Secondly, we design a novel pose-feature enhancement module and an effective two-stage training policy to empower the pose network to focus on robust static regions and temporal information, thereby enhancing the pose estimation performance in dynamic and long-term environments. Extensive experimental results demonstrate the excellent accuracy and generalization of Fine-MVO. Specifically, Fine-MVO achieves a remarkable 36.80% improvement in pose accuracy over the state-of-the-art method on the KITTI dataset, which even breaks through the performance of loop closure within geometry-based visual odometry methods. Furthermore, Fine-MVO exhibits satisfactory generalization on the outdoor dataset AirDOS-Shibuya, attaining a notable improvement of 28.22% over current advanced method. Excitingly, Fine-MVO also reveals outstanding generalization on the indoor dataset TUM-RGBD.

Fine-MVO: Toward Fine-Grained Feature Enhancement for Self-Supervised Monocular Visual Odometry in Dynamic Environments

Self-supervised Visual-LiDAR Odometry with Flip Consistency

DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual Odometry

A self-supervised monocular odometry with visual-inertial and depth representations

Feature Regions Segmentation Based RGB-D Visual Odometry in Dynamic Environment

PVO: Panoptic Visual Odometry.

Self-supervised deep monocular visual odometry and depth estimation with observation variation

BEV-ODOM: Reducing Scale Drift in Monocular Visual Odometry with BEV Representation

Robust Monocular SLAM in Dynamic Environments

Salient Sparse Visual Odometry With Pose-Only Supervision

Improving Monocular Visual Odometry Using Learned Depth

GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Unsupervised Monocular Visual-Inertial Odometry Network

A Monocular Visual Odometry Combining Edge Enhance with Deep Learning

SelfOdom: Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery

A high-precision self-supervised monocular visual odometry in foggy weather based on robust cycled generative adversarial networks and multi-task learning aided depth estimation

Self-Supervised Deep Visual Odometry with Online Adaptation

Self-Improving Visual Odometry

Self-Supervised monocular visual odometry based on cross-correlation

MD2VO: Enhancing Monocular Visual Odometry Through Minimum Depth Difference

Deep Visual Odometry with Adaptive Memory