Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

Xiangyang Zhu,Renrui Zhang,Bowei He,Ziyu Guo,Jiaming Liu,Hao Dong,Peng Gao

2023-08-25

Abstract:To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot semantic segmentation methods first pre-train the models on `seen' classes, and then evaluate their generalization performance on `unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes. To tackle these issues, we propose an efficient Training-free Few-shot 3D Segmentation netwrok, TFS3D, and a further training-based variant, TFS3D-T. Without any learnable parameters, TFS3D extracts dense representations by trigonometric positional encodings, and achieves comparable performance to previous training-based methods. Due to the elimination of pre-training, TFS3D can alleviate the domain gap issue and save a substantial amount of time. Building upon TFS3D, TFS3D-T only requires to train a lightweight query-support transferring attention (QUEST), which enhances the interaction between the few-shot query and support data. Experiments demonstrate TFS3D-T improves previous state-of-the-art methods by +6.93% and +17.96% mIoU respectively on S3DIS and ScanNet, while reducing the training time by -90%, indicating superior effectiveness and efficiency.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address several key issues in 3D point cloud semantic segmentation: 1. **Data Dependency**: Existing 3D segmentation methods heavily rely on large-scale annotated datasets, which are not only costly but also time-consuming. 2. **Domain Gap**: Current few-shot 3D semantic segmentation methods typically pre-train on "seen" categories and then test on "unseen" categories. This pre-training phase not only introduces significant time overhead but also causes a noticeable domain gap on the "unseen" categories. 3. **Training Complexity**: Traditional few-shot learning methods involve two stages: pre-training and episodic training, leading to complexity and resource consumption in the training process. To address these issues, the authors propose an efficient training-free few-shot 3D segmentation framework (TFS3D) and its training variant (TFS3D-T). Specifically: - **TFS3D**: Extracts dense representations through a parameter-free encoder and utilizes trigonometric positional encodings to achieve performance comparable to existing training methods. By eliminating the pre-training phase, TFS3D can significantly reduce the domain gap and save a substantial amount of time. - **TFS3D-T**: Builds on TFS3D by enhancing the interaction between query and support data through a lightweight Query-Support Transfer Attention module (QUEST), further improving performance. Experimental results show that TFS3D-T improves mIoU by 6.93% and 17.96% on the S3DIS and ScanNet datasets, respectively, while reducing training time by over 90%, demonstrating its superior effectiveness and efficiency.

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Few-Shot 3D Point Cloud Semantic Segmentation via Stratified Class-Specific Attention Based Transformer Network

Iterative Few-shot Semantic Segmentation from Image Label Text

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

A Simple Framework of Few-Shot Learning Using Sparse Annotations for Semantic Segmentation of 3-D Point Clouds

Label-Efficient Few-Shot Semantic Segmentation with Unsupervised Meta-Training

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation

Few-shot Semantic Segmentation Via Perceptual Attention and Spatial Control

Prototype Adaption and Projection for Few- and Zero-Shot 3D Point Cloud Semantic Segmentation

Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation

Few-Shot 3D Volumetric Segmentation with Multi-Surrogate Fusion

Few-shot 3D Point Cloud Semantic Segmentation

Disentangled Foreground-Semantic Adapter Network for Generalized Aerial Image Few-Shot Semantic Segmentation

Deep Reasoning Network for Few-shot Semantic Segmentation

Spatial Correlation Fusion Network for Few-Shot Segmentation

On filling the intra-class and inter-class gaps for few-shot segmentation

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

Rethinking and Improving Few-Shot Segmentation from a Contour-Aware Perspective