Abstract:Recently, learning-based visual odometry (VO) has attained remarkable success in vision-based measurement, especially in indoor robotics. Unfortunately, existing methods usually underexplore geometric-semantic (G-S) information, thus resulting in inefficient perception in unseen dynamic environments. Meanwhile, they are usually time-consuming, since they typically rely on high-complexity semantic segmentation models, resulting in concurrence reduction. In this article, we develop a G-S information enhanced lightweight VO (GSL-VO) that can work particularly well in dynamic environments. Specifically, on the one hand, to improve the robustness of VO through G-S information, we first come up with a novel image enhancement module to tackle motion blur, thus enabling accurate geometric and semantic information extraction. Second, we design an adaptive G-S information processing module that combines geometric and semantic information to retain reliable features for pose measurement. Moreover, semantic information is expressed via a probability framework for accurate and robust movable object extraction. On the other hand, we further propose a lightweight semantic segmentation model that enjoys an efficient multilevel feature aggregation capability to address the speed bottleneck of VO. A series of experiments on two well-known RGB-D dynamic datasets indicate that our proposed method is both accurate and fast: while achieving a significant average improvement of 70.5% in absolute trajectory error (ATE) over state-of-the-art learning-based VO on Bonn RGB-D Dynamic dataset, GSL-VO leads to high 22.3 FPS on a low-cost platform, which makes it well-suited for practical scenarios. Remarkably, on a challenging dynamic sequence of TUM RGB-D dataset, GSL-VO improves the baseline VO by 88.9% in ATE.

SCVO: Scale-Consistent Depth and Pose for Unsupervised Visual Odometry

Self-supervised Visual-LiDAR Odometry with Flip Consistency

CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth

Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World

Salient Sparse Visual Odometry With Pose-Only Supervision

PVO: Panoptic Visual Odometry.

MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras

DF-VO: What Should Be Learnt for Visual Odometry?

Unsupervised Monocular Visual-Inertial Odometry Network

OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving

Visual Odometry Based On Semantic Supervision

XVO: Generalized Visual Odometry via Cross-Modal Self-Training

DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual Odometry

Spatio-temporal and geometry constrained network for automobile visual odometry

Pose Refinement: Bridging the Gap Between Unsupervised Learning and Geometric Methods for Visual Odometry.

Improving Monocular Visual Odometry Using Learned Depth

SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications

MD2VO: Enhancing Monocular Visual Odometry Through Minimum Depth Difference

A self-supervised monocular odometry with visual-inertial and depth representations

Deep Visual Odometry with Adaptive Memory

GSL-VO: A Geometric-Semantic Information Enhanced Lightweight Visual Odometry in Dynamic Environments