LRCFormer: lightweight transformer based radar-camera fusion for 3D target detection
Xiaohong Huang,Kunqiang Xu,Ziran Tian
DOI: https://doi.org/10.1007/s11760-024-03595-2
IF: 1.583
2024-12-05
Signal Image and Video Processing
Abstract:In response to sensor performance degradation under complex environmental conditions and the low detection efficiency of traditional 3D object detection models due to their inherent complexity. In this paper, we propose a lightweight 3D object detection model based on the improved REDFormer, dubbed "LRCFormer". Initially, to enhance computational efficiency and mitigate the vanishing gradient problem, a piecewise linear activation function is introduced to optimize the radar backbone network. This network, in conjunction with the image encoder, extracts radar and multi-scale image features, respectively. Subsequently, we propose an improved spatio-temporal encoding fusion module. This module employs a single-head attention mechanism, replacing the traditional multi-head attention mechanism, and incorporates multi-scale pooling feature extraction to optimize the temporal attention module, thus enhancing the processing efficiency of time-series data. Furthermore, a multi-scale fusion network replaces the forward feedback network in the original encoder, thereby effectively integrating features of different resolutions. Finally, the detection head performs 3D object detection tasks. Experimental results on the nuScenes public dataset show that the model is not only smaller in size (reducing 13.2M parameters compared to the baseline) but also achieves better detection accuracy than the state-of-the-art (SOTA) models, with an average detection precision (mAP) increase of 0.7% and a nuScenes detection score (NDS) increase of 1.8%.Particularly in rainy and nighttime scenarios, improvements of 2% and 1.9% in mAP, and 0.5% and 4.5% in NDS, respectively, were achieved.
engineering, electrical & electronic,imaging science & photographic technology