Xiangyang Zhu,Renrui Zhang,Bowei He,Ziyu Guo,Jiaming Liu,Han Xiao,Chaoyou Fu,Hao Dong,Peng Gao
Abstract:To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' classes. To tackle these issues, we propose a Non-parametric Network for few-shot 3D Segmentation, Seg-NN, and its Parametric variant, Seg-PN. Without training, Seg-NN extracts dense representations by hand-crafted filters and achieves comparable performance to existing parametric models. Due to the elimination of pre-training, Seg-NN can alleviate the domain gap issue and save a substantial amount of time. Based on Seg-NN, Seg-PN only requires training a lightweight QUEry-Support Transferring (QUEST) module, which enhances the interaction between the support set and query set. Experiments suggest that Seg-PN outperforms previous state-of-the-art method by +4.19% and +7.71% mIoU on S3DIS and ScanNet datasets respectively, while reducing training time by -90%, indicating its effectiveness and efficiency.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the dependency on large-scale datasets in 3D scene segmentation, particularly in few-shot learning scenarios. Current 3D few-shot segmentation methods typically require pre-training on "seen" categories and then evaluate their generalization performance on "unseen" categories. However, this pre-training phase not only introduces excessive time overhead but also leads to significant domain gaps on "unseen" categories.
To solve these issues, the authors propose a non-parametric network (Non-parametric Network) — Seg-NN, and its parametric variant — Seg-PN. The main features of these models are as follows:
1. **Non-parametric Network Seg-NN**:
- **No Training Required**: Seg-NN extracts dense representations through manually designed filters without any training process, thus avoiding pre-training and domain gap issues.
- **Efficiency**: By eliminating pre-training, Seg-NN can significantly save time and resources.
2. **Parametric Network Seg-PN**:
- **Lightweight Module**: Seg-PN adds a lightweight Query-Support Transfer (QUEST) module on top of Seg-NN, enhancing the interaction between the support set and the query set.
- **Performance Improvement**: Experiments show that Seg-PN improves the mIoU by +4.19% and +7.71% on the S3DIS and ScanNet datasets, respectively, compared to the current best methods, while reducing training time by -90%.
### Main Contributions
1. **Proposed a Non-parametric Few-shot Learning Framework Seg-NN** for 3D point cloud semantic segmentation, which can serve as a foundation for building better-performing parametric variants like Seg-PN.
2. **Designed a New Query-Support Interaction Module QUEST**, which adjusts class prototypes by learning the affinity between the support set and the query set in Seg-PN, thus mitigating prototype bias caused by small support sets.
3. **Validated the Effectiveness and Efficiency of the Proposed Methods through Comprehensive Experiments**, achieving state-of-the-art performance with minimal parameters and significantly simplifying the learning process.
### Experimental Results
- **Non-parametric Seg-NN**: On the S3DIS dataset, Seg-NN significantly outperforms Point-NN and, in some cases, even surpasses parametric methods that require training, such as DGCNN and ProtoNet.
- **Parametric Seg-PN**: On the S3DIS and ScanNet datasets, Seg-PN's results significantly outperform the current best methods, with an average improvement of +4.19% and +7.71% mIoU across all four few-shot tasks.
### Conclusion
This paper effectively addresses the pre-training time and domain gap issues in existing methods by proposing non-parametric and parametric few-shot 3D scene segmentation methods, demonstrating high efficiency and superior performance in few-shot learning scenarios.