Abstract:Point cloud segmentation is essential for scene understanding, which provides advanced information for many applications, such as autonomous driving, robots, and virtual reality. To improve the accuracy and robustness of point cloud segmentation, many researchers have attempted to fuse camera images to complement the color and texture information. The common fusion strategy is the combination of convolutional operations with concatenation, element-wise addition or element-wise multiplication. However, conventional convolutional operators tend to confine the fusion of modal features within their receptive fields, which can be incomplete and limited. In addition, the inability of encoder-decoder segmentation networks to explicitly perceive segmentation boundary information results in semantic ambiguity and classification errors at object edges. These errors are further amplified in point cloud segmentation tasks, significantly affecting the accuracy of point cloud segmentation. To address the above issues, we propose a novel self-attention multi-modal fusion semantic segmentation network for point cloud semantic segmentation. Firstly, to effectively fuse different modal features, we propose a Self-Cross Fusion Module (SCF), which models long-range modality dependencies and transfers complementary image information to the point cloud to fully leverage the modality-specific advantages. Secondly, we design the Salience Refinement Module (SR), which calculates the importance of channels in the feature maps and global descriptors to enhance the representation capability of salient modal features. Finally, we propose the Local-aware Anisotropy Loss measure the element-level importance in the data and explicitly provide boundary information for the model, which alleviates the inherent semantic ambiguity problem in segmentation networks. Extensive experiments on two benchmark datasets demonstrate that our proposed method surpasses current state-of-the-art methods.

Robust 3D Semantic Segmentation Method Based on Multi-Modal Collaborative Learning

A Multi-phase Camera-LiDAR Fusion Network for 3D Semantic Segmentation with Weak Supervision

Revisiting Multi-modal 3D Semantic Segmentation in Real-world Autonomous Driving

Joint Semantic Segmentation using representations of LiDAR point clouds and camera images

Multi-modal LiDAR Point Cloud Semantic Segmentation with Salience Refinement and Boundary Perception

An RGB-D Fusion Based Semantic Segmentation Algorithm Based on Neighborhood Metric Relations

MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

LiDAR-Based Real-Time Panoptic Segmentation via Spatiotemporal Sequential Data Fusion

RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving

Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion

CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

Improved 3D Semantic Segmentation Model Based on RGB Image and LiDAR Point Cloud Fusion for Automantic Driving

Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation

Closing the Calibration Gap: A Real-Time Multi-Modal Fusion Framework for 3D Semantic Segmentation

Multi-Sem Fusion: Multimodal Semantic Fusion for 3-D Object Detection

SIESEF-FusionNet: Spatial Inter-correlation Enhancement and Spatially-Embedded Feature Fusion Network for LiDAR Point Cloud Semantic Segmentation

Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation

Road Segmentation with Image-LiDAR Data Fusion