GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

Rui Zhou,Jingbin Liu,Junbin Xie,Jianyu Zhang,Yingze Hu,Jiele Zhao
2024-11-29
Abstract:Visual-inertial odometry (VIO) is widely used in various fields, such as robots, drones, and autonomous vehicles, due to its low cost and complementary sensors. Most VIO methods presuppose that observed objects are static and time-invariant. However, real-world scenes often feature dynamic objects, compromising the accuracy of pose estimation. These moving entities include cars, trucks, buses, motorcycles, and pedestrians. The diversity and partial occlusion of these objects present a tough challenge for existing dynamic object removal techniques. To tackle this challenge, we introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into VIO, thereby improving pose estimation accuracy in environments with diverse dynamic objects and frequent occlusions. Leveraging the promptable foundation model, our solution efficiently tracks and segments a wide range of object categories. The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects, especially in urban settings with partial occlusions or swift movements. We evaluated our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects. The experimental results demonstrate that our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods. This highlights its remarkable generalization and adaptability in diverse dynamic environments, showcasing its potential to handle various dynamic objects in practical applications.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the problem of the decline in the positioning accuracy of Visual - Inertial Odometry (VIO) in dynamic environments. Specifically, most existing VIO methods assume that the observed objects are static and time - invariant, but in real - world scenarios, there are often dynamic objects (such as cars, trucks, buses, motorcycles and pedestrians), and these dynamic objects will affect the accuracy of pose estimation. To solve this problem, the paper introduces GMS - VINS (Generalized Multi - Category Segmentation Visual - Inertial Navigation System), which is an enhanced visual - inertial odometry system designed for complex dynamic environments. GMS - VINS improves the accuracy of pose estimation by integrating a multi - category semantic segmentation framework and an improved SORT algorithm, especially in cases where there are multiple dynamic objects and frequent occlusions. ### Specific Problems and Solutions 1. **The influence of dynamic objects on the VIO system**: - The presence of dynamic objects will lead to a decline in the positioning accuracy of the VIO system and may even cause the failure of the robot positioning system. - Traditional methods mainly rely on motion priors to eliminate the influence of dynamic objects, but these methods perform poorly in highly dynamic scenarios. 2. **Multi - category dynamic object segmentation and tracking**: - Existing dynamic object removal techniques have limited effectiveness in dealing with diverse dynamic objects, especially in the case of partial occlusions. - GMS - VINS can effectively identify and segment multiple categories of dynamic objects by introducing a multi - category semantic segmentation model, thereby reducing their influence on the VIO system. 3. **Improved SORT algorithm**: - The traditional SORT algorithm is not reliable enough when dealing with partially occluded or fast - moving objects in complex urban environments. - GMS - VINS proposes an enhanced SORT algorithm, which significantly improves the reliability of multi - target tracking, especially in the case of partial occlusions or fast movement. 4. **Experimental verification**: - The paper tests the effectiveness of GMS - VINS through multiple public data sets and actual - scene tests. The results show that this method performs well in various environments and outperforms other state - of - the - art methods. ### Main Contributions 1. **Propose an innovative VIO solution, GMS - VINS**, which is suitable for challenging dynamic conditions and shows excellent adaptability and generalization ability in various environments. 2. **Introduce a new method based on the prompt - based foundation model** for tracking and segmenting multiple dynamic objects to reduce the impact of dynamic environments on VIO performance. 3. **Propose an enhanced SORT algorithm** that can effectively maintain the tracking of moving objects under difficult conditions such as partial occlusions. 4. **Extensive experimental verification** shows that the pose estimation accuracy of GMS - VINS in different environments is better than that of existing methods, highlighting its potential in practical applications. Through these improvements, GMS - VINS significantly improves the robustness and accuracy of visual - inertial odometry in complex dynamic environments and provides more reliable technical support for practical applications.