3D Scene Graph Generation from Point Clouds

Wenwen Wei,Ping Wei,Jialu Qin,Zhimin Liao,Shuaijie Wang,Xiang Cheng,Meiqin Liu,Nanning Zheng
DOI: https://doi.org/10.1109/tmm.2023.3331583
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Scene graph generation is a significant and challenging task for scene understanding. Most existing methods are confined to the 2D space (i.e. images) or additional use of segmentation information, while neglecting the richer spatial and geometric information of 3D space. In this paper, we propose a novel method to generate scene graphs from 3D point clouds. Specifically, our model consists of three parts: a point feature extraction backbone, a box head, and a relation head. The feature extraction backbone extracts base features directly from raw point clouds, and the box head produces detected 3D bounding boxes. Final 3D scene graphs are obtained from the relation head which takes the extracted features and 3D boxes as inputs. We also design a point RoI module which sequentially processes points inside 3D boxes with a bidirectional LSTM. To further leverage the geometric characteristics of point clouds, we propose a location attention module which learns the influence of relative locations between objects. We introduce the RelationScanNet dataset with densely annotated semantic and geometric relationships, which extends one of the most widely used dataset ScanNetV2 in 3D indoor scene understanding. We test the proposed method on the RelationScanNet dataset and 3DSSG dataset. The results prove the strength of our method.
What problem does this paper attempt to address?