VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks

Yutong Wang,Chaoyang Jiang,Xieyuanli Chen
2024-02-21
Abstract:In recent years, object-oriented simultaneous localization and mapping (SLAM) has attracted increasing attention due to its ability to provide high-level semantic information while maintaining computational efficiency. Some researchers have attempted to enhance localization accuracy by integrating the modeled object residuals into bundle adjustment. However, few have demonstrated better results than feature-based visual SLAM systems, as the generic coarse object models, such as cuboids or ellipsoids, are less accurate than feature points. In this paper, we propose a Visual Object Odometry and Mapping framework VOOM using high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner instead of directly using object residuals in bundle adjustment. Firstly, we introduce an improved observation model and a novel data association method for dual quadrics, employed to represent physical objects. It facilitates the creation of a 3D map that closely reflects reality. Next, we use object information to enhance the data association of feature points and consequently update the map. In the visual object odometry backend, the updated map is employed to further optimize the camera pose and the objects. Meanwhile, local bundle adjustment is performed utilizing the objects and points-based covisibility graphs in our visual object mapping process. Experiments show that VOOM outperforms both object-oriented SLAM and feature points SLAM systems such as ORB-SLAM2 in terms of localization. The implementation of our method is available at https://github.com/yutongwangBIT/VOOM.git.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the accuracy and robustness of vision - based Simultaneous Localization and Mapping (SLAM) systems. Specifically, the paper proposes a new framework named VOOM (Visual Object Odometry and Mapping), aiming to improve the performance of traditional SLAM systems by combining high - level object information and low - level feature points. Traditional SLAM systems mainly rely on feature points as landmarks, which leads to the problem of high accuracy but lack of semantic information. Although existing object - level SLAM methods attempt to enhance the positioning accuracy by integrating object residuals into bundle adjustment, the effect is not significant, and sometimes even inferior to feature - point - based methods. The core contributions of VOOM are as follows: 1. **Propose a new visual object odometry and mapping framework**: This framework uses feature points and dual quadrics as hierarchical landmarks simultaneously, improving the accuracy of localization and mapping in a coarse - to - fine manner. 2. **Design effective algorithms**: For object optimization, object association, and object - based map - point association, to construct a map with hierarchical landmarks. 3. **Experimental verification**: Through extensive experimental verification, it is proved that the proposed method is superior to the existing state - of - the - art feature - point - based visual SLAM systems, such as ORB - SLAM2, in positioning accuracy. The specific implementation of the paper includes: - **Observation model**: An improved observation model and a novel data - association method are introduced to represent the dual quadrics of physical objects. These models are helpful for creating 3D maps closer to reality. - **Data association**: Use object information to enhance the data association of feature points, thereby updating the map and further optimizing the camera pose and object position. - **Local bundle adjustment**: In the process of visual object mapping, use the co - view of objects and points for local bundle adjustment to optimize the poses of local key frames and the positions of map points. Through these innovations, VOOM not only surpasses traditional feature - point SLAM systems in positioning accuracy, but also shows stronger robustness when dealing with dynamic objects and long - sequence data.