Abstract:We present InstanceFusion, a robust real‐time system to detect, segment, and reconstruct instance‐level 3D objects of indoor scenes with a hand‐held RGBD camera. It combines the strengths of deep learning and traditional SLAM techniques to produce visually compelling 3D semantic models. The key success comes from our novel segmentation scheme and the efficient instance‐level data fusion, which are both implemented on GPU. Specifically, for each incoming RGBD frame, we take the advantages of the RGBD features, the 3D point cloud, and the reconstructed model to perform instance‐level segmentation. The corresponding RGBD data along with the instance ID are then fused to the surfel‐based models. In order to sufficiently store and update these data, we design and implement a new data structure using the OpenGL Shading Language. Experimental results show that our method advances the state‐of‐the‐art (SOTA) methods in instance segmentation and data fusion by a big margin. In addition, our instance segmentation improves the precision of 3D reconstruction, especially in the loop closure. InstanceFusion system runs 20.5Hz on a consumer‐level GPU, which supports a number of augmented reality (AR) applications (e.g., 3D model registration, virtual interaction, AR map) and robot applications (e.g., navigation, manipulation, grasping). To facilitate future research and reproduce our system more easily, the source code, data, and the trained model are released on Github: https://github.com/Fancomi2017/InstanceFusion.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to perform instance - level detection, segmentation and reconstruction of 3D objects in indoor scenes using a single RGBD camera under real - time conditions. Specifically, the authors propose a system named InstanceFusion, which aims to overcome several key challenges in instance - level 3D reconstruction of existing methods: 1. **Instance - level segmentation accuracy**: Traditional 3D reconstruction techniques can often only provide geometric models and lack object - level semantic information. InstanceFusion achieves accurate instance - level segmentation of 3D objects by combining the advantages of deep learning and traditional SLAM techniques. 2. **Real - time performance**: Existing methods either rely on post - processing techniques and lack immediate feedback, or process online but cannot generate high - quality 3D models due to inaccurate 2D segmentation results. InstanceFusion achieves a real - time performance of 20.5Hz on a consumer - grade GPU through an optimized two - stage segmentation algorithm and an efficient instance - level data fusion process. 3. **Robustness and automation**: The system can automatically complete the entire process from the original RGBD stream to the incrementally fused instance - level surface model without any prior scene knowledge or predefined template models, improving the robustness and automation level of the system. 4. **Expansion of application scenarios**: InstanceFusion not only supports multiple augmented reality (AR) applications, such as 3D model registration, virtual interaction, AR maps, but is also suitable for tasks such as robot navigation, manipulation and grasping. In summary, InstanceFusion aims to solve the problem of instance - level detection, segmentation and reconstruction of 3D objects in indoor scenes through an efficient, accurate and real - time method, providing strong technical support for AR and robot applications.

InstanceFusion: Real‐time Instance‐level 3D Reconstruction Using a Single RGBD Camera

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

ObjectFusion: an Object Detection and Segmentation Framework with RGB-D SLAM and Convolutional Neural Networks

3DFusion, A real-time 3D object reconstruction pipeline based on streamed instance segmented data

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

Online Global Non-rigid Registration for 3D Object Reconstruction Using Consumer-level Depth Cameras

Mobile3DScanner: an Online 3D Scanner for High-quality Object Reconstruction with a Mobile Device

Robust 3D Reconstruction with an RGB-D Camera

Saliency-aware Real-time Volumetric Fusion for Object Reconstruction.

HeteroFusion: Dense Scene Reconstruction Integrating Multi-Sensors.

FastFusion: Real-Time Indoor Scene Reconstruction with Fast Sensor Motion

Robust Keyframe-based Dense SLAM with an RGB-D Camera.

FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything

RobustFusion: Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors

Research on Indoor 3D Reconstruction Technology Based on Semantic Visual Simultaneous Localization and Mapping

3D real-time human reconstruction with a single RGBD camera

DetectFusion: Detecting and Segmenting Both Known and Unknown Dynamic Objects in Real-time SLAM

RGB-Fusion: Monocular 3D reconstruction with learned depth prediction

Real-time High-accuracy Three-Dimensional Reconstruction with Consumer RGB-D Cameras