MVINS: Tightly-Coupled Mocap-Visual-Inertial Fusion for Global and Drift-Free Pose Estimation
Meng Liu,Liang Xie,Wei Wang,Zhongchen Shi,Wei Chen,Ye Yan,Erwei Yin
DOI: https://doi.org/10.1109/jiot.2024.3367417
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:Augmented reality (AR), a prominent application within the Internet of Things (IoT) domain, demands high-performance pose estimation. Presently, the visual-inertial navigation system (VINS) is acknowledged as an essential method for providing 6-DoF poses. However, VINS builds the local frame at random during the system initialization stage, making it difficult to establish a connection with the global frame. In addition, VINS is prone to drifting. In this paper, we propose an innovative method that tightly couples markerless motion capture (Mocap) with vision and an IMU to achieve global and drift-free pose estimation for AR glasses. To address the issue of pose initialization and establish a connection between the IMU and Mocap, we introduce a coarse-to-fine initialization strategy, enabling data fusion for Mocap, vision, and the IMU under a unified global frame. Furthermore, we formulate the Mocap factor alongside the visual and inertial factors and integrate them into a factor graph framework to constrain the system states. With a spatiotemporal calibration method, the IMU-Mocap extrinsic parameter and time offset are calibrated online to improve the pose estimation accuracy. Experimental evaluations in real-world experiments demonstrate the capability of our method to accurately estimate drift-free poses in the global frame. Compared to the state-of-the-art VINS-Fusion, ORB-SLAM3, and GVIS, we achieve improvements of 81%, 42%, and 33% in translation accuracy and improvements of 58%, 33%, and 72% in rotation accuracy, respectively. Moreover, we also evaluate our system for the EuRoC dataset, further indicating the effectiveness of the proposed work.