Abstract:Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context. In this paper, we propose a Similarity-Weighted Convolution and local-global Fusion Network, named SWCF-Net, which takes into account both local and global features. We propose a Similarity-Weighted Convolution (SWConv) to effectively extract local features, where similarity weights are incorporated into the convolution operation to enhance the generalization capabilities. Then, we employ a downsampling operation on the K and V channels within the attention module, thereby reducing the quadratic complexity to linear, enabling the Transformer to deal with large-scale point clouds. At last, orthogonal components are extracted in the global features and then aggregated with local features, thereby eliminating redundant information between local and global features and consequently promoting efficiency. We evaluate SWCF-Net on large-scale outdoor datasets SemanticKITTI and Toronto3D. Our experimental results demonstrate the effectiveness of the proposed network. Our method achieves a competitive result with less computational cost, and is able to handle large-scale point clouds efficiently.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to address the challenges in large - scale point cloud semantic segmentation. Specifically, most of the existing research focuses on capturing local features while ignoring the importance of global features, resulting in the inability to fully utilize semantic context information when dealing with large - scale point clouds, thus affecting the segmentation effect. The paper proposes a network named **SWCF - Net**, which effectively extracts local and global features by combining similarity - weighted convolution (SWConv) and a local - global fusion strategy to improve the semantic segmentation performance of large - scale point clouds. ### Detailed Explanation #### 1. **Background and Motivation** - **Characteristics of large - scale point clouds**: Large - scale point clouds contain rich structural and semantic context information, but their irregular and disordered nature makes semantic segmentation difficult. - **Limitations of existing methods**: Most existing methods mainly focus on the extraction of local features and ignore global features, resulting in poor performance when dealing with large - scale point clouds. #### 2. **Main contributions of the paper** - **Similarity - weighted convolution (SWConv)**: By introducing similarity weights, the generalization ability of 3D convolution operations is improved, thereby more effectively extracting local features. - **Lightweight Transformer**: By downsampling the K and V channels in the attention module, the global encoder of the Transformer is accelerated, and the global features are combined with the local features through an orthogonal fusion strategy. - **Efficiency and accuracy**: SWCF - Net performs excellently on the SemanticKITTI and Toronto3D datasets. It not only outperforms other methods in segmentation accuracy but also consumes fewer computational resources. #### 3. **Method overview** - **Local encoder**: Use SWConv to extract local features. Avoid hard classifiers through a weighted convolution framework to improve the similarity and generalization ability of local features. - **Global encoder**: Adopt a lightweight Average Transformer. Reduce the computational complexity through downsampling while using the multi - head attention mechanism to capture global correlations. - **Fusion module**: Eliminate redundant information through orthogonal projection and effectively combine local and global features to improve overall performance. #### 4. **Experimental results** - **SemanticKITTI dataset**: SWCF - Net significantly outperforms the baseline methods in multiple categories, especially in small object categories (such as bicycles, motorcycles, cyclists, and motorcyclists). - **Toronto3D dataset**: SWCF - Net performs excellently whether RGB information is used or not. In particular, it obtains the highest IoU value when there is no RGB information. #### 5. **Ablation experiments** - **Effectiveness of different modules**: By gradually replacing and adding different modules, the effectiveness of SWConv and Average Transformer, as well as the performance improvement of the orthogonal fusion strategy, are verified. ### Summary This paper solves the problem of combining local and global features in large - scale point cloud semantic segmentation by introducing similarity - weighted convolution and a local - global fusion strategy, significantly improving the segmentation performance and efficiency.

SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

Real-time Semantic Segmentation with Weighted Factorized-Depthwise Convolution

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

PointMS: Semantic Segmentation for Point Cloud Based on Multi-scale Directional Convolution

SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation.

Associate Semantic-Instance Segmentation of 3D Point Clouds Based on Local Feature Extraction

A Local-Global Feature Fusing Method for Point Clouds Semantic Segmentation

Feature Graph Convolution Network With Attentive Fusion for Large-Scale Point Clouds Semantic Segmentation

MLFNet- Point Cloud Semantic Segmentation Convolution Network Based on Multi-Scale Feature Fusion

A Large-Scale Point Cloud Semantic Segmentation Network Via Local Dual Features and Global Correlations

LLGF-Net: Learning Local and Global Feature Fusion for 3D Point Cloud Semantic Segmentation

SFL-NET: Slight Filter Learning Network for Point Cloud Semantic Segmentation

Semantic Segmentation of Point Cloud Scene via Multi-Scale Feature Aggregation and Adaptive Fusion

Large-scale point cloud semantic segmentation via local perception and global descriptor vector

A Multi-scale Network for Semantic Segmentation of 3D Point Clouds

A Large-Scale Network Construction and Lightweighting Method for Point Cloud Semantic Segmentation

FA-ResNet: Feature affine residual network for large-scale point cloud segmentation

SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation

Local Transformer Network on 3D Point Cloud Semantic Segmentation

SGT-Net: A Transformer-Based Stratified Graph Convolutional Network for 3D Point Cloud Semantic Segmentation

MFFNet: Multimodal Feature Fusion Network for Point Cloud Semantic Segmentation