Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation

Yifan Xu,Ziming Luo,Qianwei Wang,Vineet Kamat,Carol Menassa

2024-09-16

Abstract:Current open-vocabulary scene graph generation algorithms highly rely on both 3D scene point cloud data and posed RGB-D images and thus have limited applications in scenarios where RGB-D images or camera poses are not readily available. To solve this problem, we propose Point2Graph, a novel end-to-end point cloud-based 3D open-vocabulary scene graph generation framework in which the requirement of posed RGB-D image series is eliminated. This hierarchical framework contains room and object detection/segmentation and open-vocabulary classification. For the room layer, we leverage the advantage of merging the geometry-based border detection algorithm with the learning-based region detection to segment rooms and create a "Snap-Lookup" framework for open-vocabulary room classification. In addition, we create an end-to-end pipeline for the object layer to detect and classify 3D objects based solely on 3D point cloud data. Our evaluation results show that our framework can outperform the current state-of-the-art (SOTA) open-vocabulary object and room segmentation and classification algorithm on widely used real-scene datasets.

Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the issue of dependency on registered RGB-D image sequences when generating open-vocabulary 3D scene graphs in robot navigation. Current open-vocabulary scene graph generation algorithms heavily rely on 3D scene point cloud data and registered RGB-D images, which limits their application in the absence of RGB-D images or camera pose information. To solve this problem, the authors propose the Point2Graph framework, a new method for generating open-vocabulary 3D scene graphs based solely on point cloud data, eliminating the need for registered RGB-D images. Specifically, the Point2Graph framework includes room detection/segmentation and classification as well as object detection/segmentation and classification. For the room layer, the authors combine geometric boundary detection algorithms with learning-based region detection methods to segment rooms and create a "Snap-Lookup" framework for open-vocabulary room classification. At the object layer, the authors developed an end-to-end pipeline that uses only 3D point cloud data to detect and classify 3D objects. Experimental results show that this framework outperforms current state-of-the-art open-vocabulary object and room segmentation and classification algorithms on widely used real-world scene datasets.

Point2Graph: An End-to-end Point Cloud-based 3D Open-Vocabulary Scene Graph for Robot Navigation

Semantic Graph Based Place Recognition for 3D Point Clouds.

Open-Vocabulary Octree-Graph for 3D Scene Understanding

3D Scene Graph Generation from Point Clouds

A New Approach of Point Cloud Processing and Scene Segmentation for Guiding the Visually Impaired

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Scene Segmentation and Understanding for Context-Free Point Clouds

MR-COGraphs: Communication-efficient Multi-Robot Open-vocabulary Mapping System via 3D Scene Graphs

Bidirectional Edge-Based 3D Scene Graph Generation from Point Clouds

Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning

Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships

Instance-incremental Scene Graph Generation from Real-world Point Clouds via Normalizing Flows

The Bare Necessities: Designing Simple, Effective Open-Vocabulary Scene Graphs

3D Scene Graph Prediction on Point Clouds Using Knowledge Graphs

Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection

3Dgraphseg: A Unified Graph Representation- Based Point Cloud Segmentation Framework for Full-Range High-Speed Railway Environments

Graph-Based Robust Localization of Object-Level Map for Mobile Robotic Navigation.

S-Graphs+: Real-time Localization and Mapping leveraging Hierarchical Representations