A Single-Stage 3D Object Detection Method Based on Sparse Attention Mechanism.

Songche Jia,Zhenyu Zhang
DOI: https://doi.org/10.1007/978-981-99-8435-0_33
2024-01-01
Abstract:The Bird’s Eye View (BEV) feature extraction module is an important part of 3D object detection based on point cloud data. However, the existing methods ignore the correlation between objects, resulting in a large amount of irrelevant information participating in feature extraction, which makes the detection accuracy low. To solve this problem, this paper proposes a BEV feature extraction method named Dynamic Extraction Of Effective Features (DEF) and designs a single-stage 3D object detection model. This feature extraction method first uses convolution operations to extract local features. Then the weight of elements in the BEV feature map is redistributed by spatial attention, highlighting the position of critical elements in the feature map. Then, a sparse two-level routing attention mechanism is used globally to screen out top-k routing regions with the strongest correlation with the target region to avoid interference from irrelevant information. Finally, a token-to-token attention operation is applied to the joint top-k routing regions to extract effective features. The results on the benchmark KITTI dataset show that our method can effectively improve the detection accuracy of 3D objects.
What problem does this paper attempt to address?