Closing the Calibration Gap: A Real-Time Multi-Modal Fusion Framework for 3D Semantic Segmentation

Feng Jiang,Wanqing Peng,Chaoping Tu,Xiaoyan Li,Jun Li,Hanqing Huang,Di Feng,Jian Pu
DOI: https://doi.org/10.1109/tiv.2024.3505261
IF: 8.2
2024-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:LiDAR and camera are two critical sensors for multi-modal 3D semantic segmentation and are supposed to be fused efficiently and robustly to promise safety in various realworld scenarios. However, existing multi-modal methods face two key challenges: 1) difficulty with efficient deployment and realtime execution; and 2) drastic performance degradation under weak calibration between LiDAR and cameras. To address these challenges, we propose CPGNet-LCF, a new multi-modal fusion framework extending the LiDAR-only CPGNet. CPGNet-LCF solves the first challenge by inheriting the easy deployment and real-time capabilities of CPGNet. For the second challenge, we introduce a novel weak calibration knowledge distillation strategy during training to improve the robustness against the weak calibration. CPGNet-LCF achieves state-of-the-art performance on the nuScenes and SemanticKITTI benchmarks. Remarkably, it can be easily deployed to run in 20 ms per frame on a single Tesla V100 GPU using TensorRT TF16 mode. Furthermore, we benchmark performance over four weak calibration levels, demonstrating the robustness of our proposed approach. Our code is available at https://github.com/humemarx/CPG-LCF.
What problem does this paper attempt to address?