Abstract:In recent years, object-oriented simultaneous localization and mapping (SLAM) has attracted increasing attention due to its ability to provide high-level semantic information while maintaining computational efficiency. Some researchers have attempted to enhance localization accuracy by integrating the modeled object residuals into bundle adjustment. However, few have demonstrated better results than feature-based visual SLAM systems, as the generic coarse object models, such as cuboids or ellipsoids, are less accurate than feature points. In this paper, we propose a Visual Object Odometry and Mapping framework VOOM using high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner instead of directly using object residuals in bundle adjustment. Firstly, we introduce an improved observation model and a novel data association method for dual quadrics, employed to represent physical objects. It facilitates the creation of a 3D map that closely reflects reality. Next, we use object information to enhance the data association of feature points and consequently update the map. In the visual object odometry backend, the updated map is employed to further optimize the camera pose and the objects. Meanwhile, local bundle adjustment is performed utilizing the objects and points-based covisibility graphs in our visual object mapping process. Experiments show that VOOM outperforms both object-oriented SLAM and feature points SLAM systems such as ORB-SLAM2 in terms of localization. The implementation of our method is available at https://github.com/yutongwangBIT/VOOM.git.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the accuracy and robustness of vision - based Simultaneous Localization and Mapping (SLAM) systems. Specifically, the paper proposes a new framework named VOOM (Visual Object Odometry and Mapping), aiming to improve the performance of traditional SLAM systems by combining high - level object information and low - level feature points. Traditional SLAM systems mainly rely on feature points as landmarks, which leads to the problem of high accuracy but lack of semantic information. Although existing object - level SLAM methods attempt to enhance the positioning accuracy by integrating object residuals into bundle adjustment, the effect is not significant, and sometimes even inferior to feature - point - based methods. The core contributions of VOOM are as follows: 1. **Propose a new visual object odometry and mapping framework**: This framework uses feature points and dual quadrics as hierarchical landmarks simultaneously, improving the accuracy of localization and mapping in a coarse - to - fine manner. 2. **Design effective algorithms**: For object optimization, object association, and object - based map - point association, to construct a map with hierarchical landmarks. 3. **Experimental verification**: Through extensive experimental verification, it is proved that the proposed method is superior to the existing state - of - the - art feature - point - based visual SLAM systems, such as ORB - SLAM2, in positioning accuracy. The specific implementation of the paper includes: - **Observation model**: An improved observation model and a novel data - association method are introduced to represent the dual quadrics of physical objects. These models are helpful for creating 3D maps closer to reality. - **Data association**: Use object information to enhance the data association of feature points, thereby updating the map and further optimizing the camera pose and object position. - **Local bundle adjustment**: In the process of visual object mapping, use the co - view of objects and points for local bundle adjustment to optimize the poses of local key frames and the positions of map points. Through these innovations, VOOM not only surpasses traditional feature - point SLAM systems in positioning accuracy, but also shows stronger robustness when dealing with dynamic objects and long - sequence data.

VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks

Design of an Enhanced Visual Odometry by Building and Matching Compressive Panoramic Landmarks Online

Robust Indoor Localization and Map Matching Algorithm with Visual SLAM

ObVi-SLAM: Long-Term Object-Visual SLAM

OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving

Monocular SLAM for Large Scale Scenes

QISO-SLAM: Object-Oriented SLAM Using Dual Quadrics as Landmarks Based on Instance Segmentation

Object SLAM Based on Spatial Layout and Semantic Consistency

VDO-SLAM: A Visual Dynamic Object-aware SLAM System

W-VSLAM: A Visual Mapping Algorithm for Indoor Inspection Robots

OL-SLAM: A Robust and Versatile System of Object Localization and SLAM

VR-SLAM: A Visual-Range Simultaneous Localization and Mapping System using Monocular Camera and Ultra-wideband Sensors

Contour-SLAM: A Robust Object-Level SLAM Based on Contour Alignment

Semantic Object-level Modeling for Robust Visual Camera Relocalization

POU-SLAM: Scan-to-Model Matching Based on 3D Voxels

MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras

An Omni-directional Vslam Based on Spherical Camera Model and 3D Modeling

A Tightly Coupled Monocular Visual Lidar Odometry with Loop Closure

Enhancing Real-Time Visual SLAM with Distant Landmarks in Large-Scale Environments

Voxel-SLAM: A Complete, Accurate, and Versatile LiDAR-Inertial SLAM System

OVD-SLAM: An Online Visual SLAM for Dynamic Environments