Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

Federico Rollo,Gennaro Raiola,Andrea Zunino,Nikolaos Tsagarakis,Arash Ajoudani

2023-11-22

Abstract:Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built. To further explore this direction, we propose a framework that can autonomously detect and localize predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar). The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements). The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single RGBD camera or RGB + lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach.

Robotics,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper primarily aims to address the problem of object detection and localization for mobile robots in unstructured environments, particularly during the process of building maps (SLAM) or on already constructed maps. Specifically, the research objectives include: - **Multimodal Semantic Mapping**: Develop a framework capable of autonomously detecting and localizing predefined objects in a known environment. This framework combines RGB image data, depth data (from RGB-D cameras and LiDAR) to achieve more accurate object perception. - **Improving Detection Accuracy**: Enhance the detection accuracy of obstacles both near and far by fusing data from different sensors (such as RGB cameras and LiDAR). - **Handling Sensor Errors**: Manage noise and outliers that may appear in sensor measurements to ensure stable and reliable object position estimation even while in motion. - **Real-time Application**: Design a system that can run in real-time on low-resource devices, suitable for embedded systems. The core contribution of the research is the proposal of a multimodal (RGB-D camera and LiDAR) online semantic mapping framework that can fuse sensor information in real-time based on the distance of objects and the precision of the sensors. Additionally, the paper provides a user interface (UI) application to enhance user experience and allow users to interact with objects on the map, thereby commanding the robot to perform specific tasks (such as grasping, inspecting, etc.). The experimental section validates that the proposed framework can effectively detect and localize objects in both simulated and real environments, performing better compared to using a single sensor alone.

Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

Multi-Sensor Fusion Tomato Picking Robot Localization and Mapping Research

Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery

Multi-LiDAR Mapping for Scene Segmentation in Indoor Environments for Mobile Robots

Object-aware Semantic Mapping of Indoor Scenes Using Octomap

Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data

Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework using Visual and Depth Cues

Multi-sensor fusion for robust localization with moving object segmentation in complex dynamic 3D scenes

Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots

Building and optimization of 3D semantic map based on Lidar and camera fusion

Multimodal sensor-based semantic 3D mapping for a large-scale environment

IMU and Multiple RGB-D Camera Fusion for Assisting Indoor Stop-and-Go 3D Terrestrial Laser Scanning

A Novel Multi-Sensor Nonlinear Tightly-Coupled Framework for Composite Robot Localization and Mapping

Increasing the Efficiency of 6-DoF Visual Localization Using Multi-Modal Sensory Data

A Multi-Sensor Fusion Framework for Localization Using LiDAR, IMU and RGB-DCamera

MD-SLAM: Multi-cue Direct SLAM

DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map

Pose‐graph underwater simultaneous localization and mapping for autonomous monitoring and 3D reconstruction by means of optical and acoustic sensors

Environment Mapping Using Sensor Fusion of 2D Laser Scanner and 3D Ultrasonic Sensor for a Real Mobile Robot

Constructing Metric-Semantic Maps using Floor Plan Priors for Long-Term Indoor Localization