Gravity-Shift-VIO: Adaptive Acceleration Shift and Multi-Modal Fusion with Transformer in Visual-Inertial Odometry

Jiale Chen,Shuang Zhang,Zhiheng Li,Xin Jin
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191248
2023-01-01
Abstract:Visual-inertial odometry (VIO) estimates the 6-degree-of-freedom (6-DoF) ego-motion of an agent based on sequential data from cameras and inertial measurement units (IMUs). The acceleration measured through IMUs is affected by gravity, which is typically addressed by initialization methods in traditional VIO approaches. However, this problem has not received much attention in recent end-to-end deep learning methods. For raw , gravity causes overlapping between different motion patterns, degenerating the representation embedding, which limits the performance of pose estimation. In this paper, we propose Gravity-Shift-VIO, an attention-based approach that addresses this issue by adaptively shifting the acceleration vector before the representation embedding. Further, a cross-frame multimodal transformer is introduced to fuse multimodal information. Experimentation on the KITTI dataset shows that Gravity-Shift-VIO exhibits strong performance and shows promising results in terms of ego-motion estimation. Further ablation study indicates that the Gravity-Shift-VIO is highly effective in reducing the overlap of acceleration representation caused by gravity. And the cross-frame transformer effectively improves the multi-sensor fusion and time-series feature extraction.
What problem does this paper attempt to address?