Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An,Guolei Sun,Yun Liu,Fayao Liu,Zongwei Wu,Dan Wang,Luc Van Gool,Serge Belongie
2024-03-01
Abstract:This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS), with a focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution. The former arises from non-uniform point sampling, allowing models to distinguish the density disparities between foreground and background for easier segmentation. The latter results from sampling only 2,048 points, limiting semantic information and deviating from the real-world practice. To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built. Moreover, we propose a novel FS-PCS model. While previous methods are based on feature optimization by mainly refining support features to enhance prototypes, our method is based on correlation optimization, referred to as Correlation Optimization Segmentation (COSeg). Specifically, we compute Class-specific Multi-prototypical Correlation (CMC) for each query point, representing its correlations to category prototypes. Then, we propose the Hyper Correlation Augmentation (HCA) module to enhance CMC. Furthermore, tackling the inherent property of few-shot training to incur base susceptibility for models, we propose to learn non-parametric prototypes for the base classes during training. The learned base prototypes are used to calibrate correlations for the background class through a Base Prototypes Calibration (BPC) module. Experiments on popular datasets demonstrate the superiority of COSeg over existing methods. The code is available at:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two prominent issues in the field of few - shot 3D point cloud semantic segmentation (FS - PCS): 1. **Foreground Leakage**: In the current FS - PCS setup, the point sampling process is non - uniform, tending to sample more points from the foreground classes rather than the background classes. This results in a density difference between the foreground and background classes, allowing the model to utilize this density difference for simpler segmentation without having to learn the necessary knowledge adaptation patterns. This problem makes the current benchmarks unable to truly reflect the performance of previous models. 2. **Sparse Point Distribution**: The current FS - PCS setup samples only 2,048 points during training and inference, due to the large computational cost of the label propagation modules adopted by many FS - PCS methods. However, this sparse input distribution limits the semantic information that the model can obtain, hindering the effective improvement of its recognition ability. Moreover, this input distribution, which is inconsistent with real - world scenarios, also undermines the overall value of research progress in this field. To rectify these problems, the authors propose a more stringent standard setup and establish a new benchmark. On this basis, they introduce a new FS - PCS model - **Correlation Optimization Segmentation (COSeg)**. The core of the COSeg model lies in directly shaping the relationship between query points and class prototypes by optimizing the **Class - specific Multi - prototypical Correlation (CMC)** between each query point and the class prototype, thereby achieving better generalization ability. Specifically, the COSeg model includes the following key components: - **Hyper Correlation Augmentation (HCA) module**: Refines the CMC by actively interacting with the high - dimensional relationships between points and class prototypes. - **Base Prototypes Calibration (BPC) module**: Alleviates the model's sensitivity to base classes by dynamically learning non - parameterized base prototypes during training, thereby improving the segmentation accuracy for new classes. Experimental results show that COSeg outperforms existing methods on the S3DIS and ScanNet datasets. In particular, after solving the foreground leakage problem, the performance of existing methods drops significantly, while COSeg can still maintain high performance. This demonstrates the superiority of COSeg in the few - shot 3D point cloud semantic segmentation task.