Learnable scene prior for point cloud semantic segmentation

Yuanhao Chai,Jingyu Gong,Xin Tan,Jiachen Xu,Yuan Xie,Lizhuang Ma
DOI: https://doi.org/10.1007/s00371-024-03344-z
IF: 2.835
2024-04-09
The Visual Computer
Abstract:In this paper, we propose a Geo-SceneEncoder framework to handle point cloud scene semantic segmentation, including a SceneEncoder to learn a scene prior, an advanced geometric kernel to learn geometry information from the point cloud, and a region similarity loss to refine segmentation results. In semantic segmentation, global information plays a pivotal role, while most recent works ignore the importance and usually fail to fully use it. Specifically, they do not explicitly extract meaningful global information and simply use global features in the concatenation. In this paper, we propose a SceneEncoder module to give scene-aware guidance to final segmentation results. This module learns to predict a scene descriptor that represents the categories existing in the scene and uses it to filter out categories not belonging to this scene directly. Additionally, to better use geometry information in the point cloud, we propose an advanced version of kernel correlation to extract geometric features at various scales. Then, we design a region similarity loss to alleviate segmentation noise in the local region. This loss propagates distinguishing features to their neighbors with the same label, enhancing the distinguishing ability of point-wise features. We integrate our methods into several prevailing networks and conduct comprehensive experiments on benchmark datasets ScanNet, S3DIS, and ShapeNet. Results show that our methods greatly improve the performance of baselines and outperform many state-of-the-art competitors. The source code is available at https://github.com/azuki-miho/GeoSceneEncoder.
computer science, software engineering
What problem does this paper attempt to address?