MSGFusion: Muti-scale Semantic Guided LiDAR-Camera Fusion for 3D Object Detection

Huming Zhu,Yiyu Xue,Xinyue Cheng,Biao Hou
DOI: https://doi.org/10.1109/ijcnn60899.2024.10651407
2024-01-01
Abstract:3D object detection is a key technology in automatic driving perception, which can provide the basis for safe and reliable autonomous driving. Aiming at the problem of false positive of low resolution object in point clouds, we present Multi-scale Semantic Guided LiDAR-Camera Fusion for 3D Object Detection(MSGFusion), which deeply fuses the features of image and LiDAR points. Specifically, we design multi-scale DenseFusion, which serially aggregate images features, point-wise features and voxel-wise feature volumes at different scales. At the same time, we design a new Image-based Predicted Keypoint Weighting(I-PKW). It predicts the object points based on the predicted foreground score map. Given the 3D proposals generated by the voxel CNN, we propose RoI-Pillar pooling. It abstracts the feature by aggregating the keypoints in the RoI by pillars. Compared with RoI-grid pooling, pillar-based feature encoding is more consistent with the distribution of fused feature keypoints to accurately regress the classification confidence and bounding box. Extensive experiments on the KITTI dataset show the superiority of MSGFusion.
What problem does this paper attempt to address?