Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

Federico Rollo,Gennaro Raiola,Andrea Zunino,Nikolaos Tsagarakis,Arash Ajoudani
2023-11-22
Abstract:Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built. To further explore this direction, we propose a framework that can autonomously detect and localize predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar). The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements). The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single RGBD camera or RGB + lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the problem of object detection and localization for mobile robots in unstructured environments, particularly during the process of building maps (SLAM) or on already constructed maps. Specifically, the research objectives include: - **Multimodal Semantic Mapping**: Develop a framework capable of autonomously detecting and localizing predefined objects in a known environment. This framework combines RGB image data, depth data (from RGB-D cameras and LiDAR) to achieve more accurate object perception. - **Improving Detection Accuracy**: Enhance the detection accuracy of obstacles both near and far by fusing data from different sensors (such as RGB cameras and LiDAR). - **Handling Sensor Errors**: Manage noise and outliers that may appear in sensor measurements to ensure stable and reliable object position estimation even while in motion. - **Real-time Application**: Design a system that can run in real-time on low-resource devices, suitable for embedded systems. The core contribution of the research is the proposal of a multimodal (RGB-D camera and LiDAR) online semantic mapping framework that can fuse sensor information in real-time based on the distance of objects and the precision of the sensors. Additionally, the paper provides a user interface (UI) application to enhance user experience and allow users to interact with objects on the map, thereby commanding the robot to perform specific tasks (such as grasping, inspecting, etc.). The experimental section validates that the proposed framework can effectively detect and localize objects in both simulated and real environments, performing better compared to using a single sensor alone.