Two-Stage Feature Attention Fusion for Radar-Camera 3D Object Detection

Fenlan Zhang,Jingjing Li,Yuanfa Ji,Xiyan Sun
DOI: https://doi.org/10.1145/3625403.3625424
2023-01-01
Abstract:Multi-sensor fusion is essential for 3D object detection in intelligent transportation due to it makes best use of cross-modality information, in which feature-level fusion of millimeter-wave radar and camera has been a hot topic. Existing research considers only one of the fusion operations when fusing radar and camera features, such as concatenation, element-wise addition, and element-wise multiplication, however, these fusion operations have the potential to be combined, as they are complementary. To fill the gap mentioned above, this paper proposes a two-stage feature attention fusion (TSAF) network for radar-camera 3D object detection, where we introduce multiplication fusion in the first stage of fusion and concatenation fusion in the second stage of fusion. At the same time, a modified radar spatial attention module (RSAM) is exploited in the first stage to improve the utilization of the radar features by reallocating the weights of the radar key information. Moreover, a camera channel attention module (CCAM) is applied in the second stage of fusion to enhance the camera key feature. The proposed fusion network is evaluated on the nuScenes dataset. Experimental results show that, compared to the baseline, TSAF improves the mean average precision (mAP) from 35.3% to 38.8%, and the nuScenes Detection Score (NDS) from 43.4% to 48.8%, which verify the effectiveness of TSAF.
What problem does this paper attempt to address?