Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation

Yifan Xu,Ziming Luo,Qianwei Wang,Vineet Kamat,Carol Menassa
2024-09-16
Abstract:Current open-vocabulary scene graph generation algorithms highly rely on both 3D scene point cloud data and posed RGB-D images and thus have limited applications in scenarios where RGB-D images or camera poses are not readily available. To solve this problem, we propose Point2Graph, a novel end-to-end point cloud-based 3D open-vocabulary scene graph generation framework in which the requirement of posed RGB-D image series is eliminated. This hierarchical framework contains room and object detection/segmentation and open-vocabulary classification. For the room layer, we leverage the advantage of merging the geometry-based border detection algorithm with the learning-based region detection to segment rooms and create a "Snap-Lookup" framework for open-vocabulary room classification. In addition, we create an end-to-end pipeline for the object layer to detect and classify 3D objects based solely on 3D point cloud data. Our evaluation results show that our framework can outperform the current state-of-the-art (SOTA) open-vocabulary object and room segmentation and classification algorithm on widely used real-scene datasets.
Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue of dependency on registered RGB-D image sequences when generating open-vocabulary 3D scene graphs in robot navigation. Current open-vocabulary scene graph generation algorithms heavily rely on 3D scene point cloud data and registered RGB-D images, which limits their application in the absence of RGB-D images or camera pose information. To solve this problem, the authors propose the Point2Graph framework, a new method for generating open-vocabulary 3D scene graphs based solely on point cloud data, eliminating the need for registered RGB-D images. Specifically, the Point2Graph framework includes room detection/segmentation and classification as well as object detection/segmentation and classification. For the room layer, the authors combine geometric boundary detection algorithms with learning-based region detection methods to segment rooms and create a "Snap-Lookup" framework for open-vocabulary room classification. At the object layer, the authors developed an end-to-end pipeline that uses only 3D point cloud data to detect and classify 3D objects. Experimental results show that this framework outperforms current state-of-the-art open-vocabulary object and room segmentation and classification algorithms on widely used real-world scene datasets.