IMAM: Incorporating Multiple Attention Mechanisms for 3D Object Detection from Point Cloud

Jing Zhou,Han Wu
DOI: https://doi.org/10.1007/978-3-031-44195-0_10
2023-01-01
Abstract:Nowadays, 3D object detection technology from point clouds develops rapidly. However, lots of small objects emerge in real point cloud scenes, which are hard to be detected due to few points, hindering overall detection accuracy. To address this issue, we propose a novel two-stage 3D object detection method, which introduces an attention strategy to enhance key structure information of objects, so as to promote overall detection accuracy, especially for small objects. Specifically, in the first stage, we employ the convolutional block attention module on the 3D sparse convolution layer to extract voxel features and further apply the Swin Transformer to enhance Bird's Eye View (BEV) feature for generating high-quality proposals. Then, in the second stage, we apply a Voxel Set Abstraction (VSA) module to fuse voxel features and BEV features into keypoint features, followed by a Region of Interest (RoI) pooling module to obtain grid features for confidence prediction and box regression. Experiment results on the KITTI dataset prove that our method IMAM achieves excellent detection performance, especially for pedestrians and cyclists with small sizes.
What problem does this paper attempt to address?