A Multi-phase Camera-LiDAR Fusion Network for 3D Semantic Segmentation with Weak Supervision
Xuepeng Chang,Huihui Pan,Weichao Sun,Huijun Gao
DOI: https://doi.org/10.1109/tcsvt.2023.3241641
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Camera and LiDAR are indispensable perception units in autonomous driving, providing complementary environmental information for 3D semantic segmentation. It is the key point that fuses the information of two modalities to accurate and robust semantic segmentation. However, three major factors will restrict the performance of fusion-based methods, i.e., the reliability of image features, the contribution of different image features, and the trade-off between results of image and point cloud. This paper proposes a novel multi-phase fusion network for 3D semantic segmentation. For the first factor, this paper takes the lead in regarding the problem that image features may be wrong due to the lack of dense annotations in the common datasets as a weak supervision problem and introduces the weakly supervised loss. Second, the proposed attention based feature fusion module can filter and reweight the image features effectively. Third, the results of the two modalities are further fused by self-confidence based late fusion module at pixel-level to complement their advantages. The proposed scheme has been evaluated on nuScenes and SemanticKITTI benchmarks, and the results show the competitiveness with state-of-the-art methods. The ablation studies demonstrate the superiority of the method in sparse classes segmentation. In addition, this paper also evaluates the robustness, and the results of the proposed method can keep relatively accurate even when faults in one of the sensors.
engineering, electrical & electronic