SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

Zhenchao Lin,Li He,Hongqiang Yang,Xiaoqun Sun,Cuojin Zhang,Weinan Chen,Yisheng Guan,Hong Zhang
2024-06-17
Abstract:Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context. In this paper, we propose a Similarity-Weighted Convolution and local-global Fusion Network, named SWCF-Net, which takes into account both local and global features. We propose a Similarity-Weighted Convolution (SWConv) to effectively extract local features, where similarity weights are incorporated into the convolution operation to enhance the generalization capabilities. Then, we employ a downsampling operation on the K and V channels within the attention module, thereby reducing the quadratic complexity to linear, enabling the Transformer to deal with large-scale point clouds. At last, orthogonal components are extracted in the global features and then aggregated with local features, thereby eliminating redundant information between local and global features and consequently promoting efficiency. We evaluate SWCF-Net on large-scale outdoor datasets SemanticKITTI and Toronto3D. Our experimental results demonstrate the effectiveness of the proposed network. Our method achieves a competitive result with less computational cost, and is able to handle large-scale point clouds efficiently.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to address the challenges in large - scale point cloud semantic segmentation. Specifically, most of the existing research focuses on capturing local features while ignoring the importance of global features, resulting in the inability to fully utilize semantic context information when dealing with large - scale point clouds, thus affecting the segmentation effect. The paper proposes a network named **SWCF - Net**, which effectively extracts local and global features by combining similarity - weighted convolution (SWConv) and a local - global fusion strategy to improve the semantic segmentation performance of large - scale point clouds. ### Detailed Explanation #### 1. **Background and Motivation** - **Characteristics of large - scale point clouds**: Large - scale point clouds contain rich structural and semantic context information, but their irregular and disordered nature makes semantic segmentation difficult. - **Limitations of existing methods**: Most existing methods mainly focus on the extraction of local features and ignore global features, resulting in poor performance when dealing with large - scale point clouds. #### 2. **Main contributions of the paper** - **Similarity - weighted convolution (SWConv)**: By introducing similarity weights, the generalization ability of 3D convolution operations is improved, thereby more effectively extracting local features. - **Lightweight Transformer**: By downsampling the K and V channels in the attention module, the global encoder of the Transformer is accelerated, and the global features are combined with the local features through an orthogonal fusion strategy. - **Efficiency and accuracy**: SWCF - Net performs excellently on the SemanticKITTI and Toronto3D datasets. It not only outperforms other methods in segmentation accuracy but also consumes fewer computational resources. #### 3. **Method overview** - **Local encoder**: Use SWConv to extract local features. Avoid hard classifiers through a weighted convolution framework to improve the similarity and generalization ability of local features. - **Global encoder**: Adopt a lightweight Average Transformer. Reduce the computational complexity through downsampling while using the multi - head attention mechanism to capture global correlations. - **Fusion module**: Eliminate redundant information through orthogonal projection and effectively combine local and global features to improve overall performance. #### 4. **Experimental results** - **SemanticKITTI dataset**: SWCF - Net significantly outperforms the baseline methods in multiple categories, especially in small object categories (such as bicycles, motorcycles, cyclists, and motorcyclists). - **Toronto3D dataset**: SWCF - Net performs excellently whether RGB information is used or not. In particular, it obtains the highest IoU value when there is no RGB information. #### 5. **Ablation experiments** - **Effectiveness of different modules**: By gradually replacing and adding different modules, the effectiveness of SWConv and Average Transformer, as well as the performance improvement of the orthogonal fusion strategy, are verified. ### Summary This paper solves the problem of combining local and global features in large - scale point cloud semantic segmentation by introducing similarity - weighted convolution and a local - global fusion strategy, significantly improving the segmentation performance and efficiency.