ACF-Net: Asymmetric Cascade Fusion for 3D Detection with LiDAR Point Clouds and Images

Yonglin Tian,Xianjing Zhang,Xiao Wang,Jintao Xu,Jiangong Wang,Rui Ai,Weihao Gu,Weiping Ding
DOI: https://doi.org/10.1109/tiv.2023.3341223
IF: 8.2
2024-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:The recognition and utilization of complementary information arising from modality-intrinsic properties play crucial roles in multimodal 3D detection. However, most of the current approaches for fusion-based 3D detection follow symmetrical fusion paradigms and adopt early fusion, middle fusion as well as late fusion styles, which ignore the unequal status of data with different modalities. In this paper, according to the timing of fusion, we adopt an asymmetric cascade fusion network to exploit both the structural information from point clouds and the complementary semantic information from images. A multi-stage cascade design of 3D object detection is proposed to iteratively refine predictions and several late image features (comprised of detection clues, segmentation clues, and deep features from encoders) are incorporated into different stages of the LiDAR branch to maintain the integrity of image features and enable deep multimodal interactions. Besides, to mitigate the effects of the down-sampling of voxelized features and possible mismatching of multimodal data, we propose proxy-based cross-modality sampling to utilize the high-density point clouds coordinates and develop an image degeneration process to simulate the noise in cross-modality matching for robust training. Extensive experiments are conducted on KITTI and Waymo Open Dataset, which validate the effectiveness of the proposed method.
What problem does this paper attempt to address?