Fine-MVO: Toward Fine-Grained Feature Enhancement for Self-Supervised Monocular Visual Odometry in Dynamic Environments

Wenhui Wei,Yang Ping,Jiadong Li,Xin Liu,Yangfan Zhou
DOI: https://doi.org/10.1109/tits.2024.3404924
2024-01-01
Abstract:Self-supervised monocular visual odometry has a crucial advantage of not depending on labels and has shown significant performance in autonomous driving and robotics. However, recent methods suffer from limited feature representations as they depend on coarse semantic masks to handle dynamic objects, resulting in diminished accuracy in dynamic environments. In contrast to these coarse-grained methods, we present Fine-MVO, a novel self-supervised monocular visual odometry that aims to address dynamic objects using implicit fine-grained feature representations, thus achieving excellent accuracy and robustness in dynamic environments. First, Fine-MVO provides an efficient cross-feature augmentation module and a novel loss weight balance strategy to effectively leverage fine-grained features with implicit semantic information, leading to a great improvement in the depth estimation accuracy, especially on object boundaries in the scenes. Secondly, we design a novel pose-feature enhancement module and an effective two-stage training policy to empower the pose network to focus on robust static regions and temporal information, thereby enhancing the pose estimation performance in dynamic and long-term environments. Extensive experimental results demonstrate the excellent accuracy and generalization of Fine-MVO. Specifically, Fine-MVO achieves a remarkable 36.80% improvement in pose accuracy over the state-of-the-art method on the KITTI dataset, which even breaks through the performance of loop closure within geometry-based visual odometry methods. Furthermore, Fine-MVO exhibits satisfactory generalization on the outdoor dataset AirDOS-Shibuya, attaining a notable improvement of 28.22% over current advanced method. Excitingly, Fine-MVO also reveals outstanding generalization on the indoor dataset TUM-RGBD.
What problem does this paper attempt to address?