Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

Xiangyang Zhu,Renrui Zhang,Bowei He,Ziyu Guo,Jiaming Liu,Hao Dong,Peng Gao
2023-08-25
Abstract:To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot semantic segmentation methods first pre-train the models on `seen' classes, and then evaluate their generalization performance on `unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes. To tackle these issues, we propose an efficient Training-free Few-shot 3D Segmentation netwrok, TFS3D, and a further training-based variant, TFS3D-T. Without any learnable parameters, TFS3D extracts dense representations by trigonometric positional encodings, and achieves comparable performance to previous training-based methods. Due to the elimination of pre-training, TFS3D can alleviate the domain gap issue and save a substantial amount of time. Building upon TFS3D, TFS3D-T only requires to train a lightweight query-support transferring attention (QUEST), which enhances the interaction between the few-shot query and support data. Experiments demonstrate TFS3D-T improves previous state-of-the-art methods by +6.93% and +17.96% mIoU respectively on S3DIS and ScanNet, while reducing the training time by -90%, indicating superior effectiveness and efficiency.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address several key issues in 3D point cloud semantic segmentation: 1. **Data Dependency**: Existing 3D segmentation methods heavily rely on large-scale annotated datasets, which are not only costly but also time-consuming. 2. **Domain Gap**: Current few-shot 3D semantic segmentation methods typically pre-train on "seen" categories and then test on "unseen" categories. This pre-training phase not only introduces significant time overhead but also causes a noticeable domain gap on the "unseen" categories. 3. **Training Complexity**: Traditional few-shot learning methods involve two stages: pre-training and episodic training, leading to complexity and resource consumption in the training process. To address these issues, the authors propose an efficient training-free few-shot 3D segmentation framework (TFS3D) and its training variant (TFS3D-T). Specifically: - **TFS3D**: Extracts dense representations through a parameter-free encoder and utilizes trigonometric positional encodings to achieve performance comparable to existing training methods. By eliminating the pre-training phase, TFS3D can significantly reduce the domain gap and save a substantial amount of time. - **TFS3D-T**: Builds on TFS3D by enhancing the interaction between query and support data through a lightweight Query-Support Transfer Attention module (QUEST), further improving performance. Experimental results show that TFS3D-T improves mIoU by 6.93% and 17.96% on the S3DIS and ScanNet datasets, respectively, while reducing training time by over 90%, demonstrating its superior effectiveness and efficiency.