Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Lingfeng Shen,Yanlong Cao,Wenbin Zhu,Kai Ren,Yejun Shou,Haocheng Wang,Zhijie Xu
DOI: https://doi.org/10.1016/j.cag.2024.104105
IF: 1.821
2024-01-01
Computers & Graphics
Abstract:Semantic segmentation has made notable strides in analysing homogeneous large-scale 3D scenes, yet its application to varied scenes with diverse characteristics poses considerable challenges. Traditional methods have been hampered by the dependence on resource-intensive neighborhood search algorithms, leading to elevated computational demands. To overcome these limitations, we introduce the MFAF-SCNet, a novel and computationally streamlined approach for voxel-based sparse convolutional. Our key innovation is the multi-scale feature adaptive fusion (MFAF) module, which intelligently applies a spectrum of convolution kernel sizes at the network’s entry point, enabling the extraction of multi-scale features. It adaptively calibrates the feature weighting to achieve optimal scale representation for different objects. Further augmenting our methodology is the LKSNet, an original sparse convolutional backbone designed to tackle the inherent inconsistencies in point cloud distribution. This is achieved by integrating inverted bottleneck structures with large kernel convolutions, significantly bolstering the network’s feature extraction and spatial correlation proficiency. The efficacy of MFAF-SCNet was rigorously tested against three large-scale benchmark datasets—ScanNet and S3DIS for indoor scenes, and SemanticKITTI for outdoor scenes. The experimental results underscore our method’s competitive edge, achieving high-performance benchmarks while ensuring computational efficiency.
What problem does this paper attempt to address?