CLFusion:3D Semantic Segmentation Based on Camera and Lidar Fusion

Tianyue Wang,Rujun Song,Zhuoling Xiao,Bo Yan,Haojie Qin,Di He
DOI: https://doi.org/10.1109/iscas58744.2024.10558356
2024-01-01
Abstract:In the field of autonomous driving, semantic segmentation is crucial for scene understanding. Currently, there are two main methods: camera-based and Lidar-based approaches. To address the issues of Lidar segmentation lacking texture features and image segmentation lacking distance information, this paper proposes a fusion of camera and Lidar to achieve 3D semantic segmentation. The method utilizes a dual-stream encoder-decoder network to process camera images and Lidar point cloud and incorporates a specially designed attention mechanism module for feature fusion. To avoid expensive manual annotation of 3D point clouds, the study also introduces a cross-dataset and cross-modal self-supervised training approach. Experimental results show a 2.4% improvement compared to the Lidar-only mode baseline results on the SemanticKITTI dataset and a 6% improvement on the nuScenes dataset.
What problem does this paper attempt to address?