CAF-RCNN: multimodal 3D object detection with cross-attention

Junting Liu,Deer Liu,Lei Zhu
DOI: https://doi.org/10.1080/01431161.2023.2261151
IF: 3.531
2023-10-14
International Journal of Remote Sensing
Abstract:LiDAR and camera are pivotal sensors of 3D (three-dimensional) object detection. As a result of their different characteristics, increasingly multimodal-based object detection methods have been proposed. Now, popular methods are to hardly associate camera features with LiDAR features, but the features are frequently enhanced and aggregated, so there is a major challenge in how to align two features effectively. Therefore, we propose CAF-RCNN. On the basis of PointRCNN, using Feature Pyramid Network (FPN) to extract advanced semantic features at different scales, then fusing these features with the LiDAR features of the Set ion (SA) module output in PointRCNN and subsequent steps. Regarding the features fusion module, we design a module based on the cross-attention mechanism, CAFM (Cross-Attention Fusion Module). It combines two channel attention streams in a cross-over fashion to utilize rich details about significant objects in the Image Stream and Geometric Stream. We did a lot of experiments on the KITTI dataset, and the result shows that our method is 6.43% higher than PointRCNN in 3D accuracy.
imaging science & photographic technology,remote sensing
What problem does this paper attempt to address?