Abstract:Background: Due to the refinement of region of the interests (RoIs), two-stage 3D detection algorithms can usually obtain better performance compared with most single-stage detectors. However, most two-stage methods adopt feature connection, to aggregate the grid point features using multi-scale RoI pooling in the second stage. This connection mode does not consider the correlation between multi-scale grid features. Methods: In the first stage, we employ 3D sparse convolution and 2D convolution to fully extract rich semantic features. Then, a small number of coarse RoIs are predicted based region proposal network (RPN) on generated bird’s eye view (BEV) map. After that, we adopt voxel RoI-pooling strategy to aggregate the neighborhood nonempty voxel features of each grid point in RoI in the last two layers of 3D sparse convolution. In this way, we obtain two aggregated features from 3D sparse voxel space for each grid point. Next, we design an attention feature fusion module. This module includes a local and a global attention layer, which can fully integrate the grid point features from different voxel layers. Results: We carry out relevant experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset. The average precisions of our proposed method are 88.21%, 81.51%, 77.07% on three difficulty levels (easy, moderate, and hard, respectively) for 3D detection, and 92.30%, 90.19%, 86.00% on three difficulty levels (easy, moderate, and hard, respectively) for BEV detection. Conclusions: In this paper, we propose a novel two-stage 3D detection algorithm named Grid Attention Fusion Region-based Convolutional Neural Network (GAF-RCNN) from point cloud. Because we integrate multi-scale RoI grid features with attention mechanism in the refinement stage, different multi-scale features can be better correlated, achieving a competitive level compared with other well tested detection algorithms. This 3D object detection has important implications for robot and cobot technology.

CAF-RCNN: multimodal 3D object detection with cross-attention

PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module

ObjectFusion: an Object Detection and Segmentation Framework with RGB-D SLAM and Convolutional Neural Networks

Cascaded Cross-Modality Fusion Network for 3D Object Detection

Cascade fusion of multi-modal and multi-source feature fusion by the attention for three-dimensional object detection

BAFusion: Bidirectional Attention Fusion for 3D Object Detection Based on LiDAR and Camera

GAF-RCNN: Grid Attention Fusion 3D Object Detection from Point Cloud

3D Object Detection Based on Attention and Multi-Scale Feature Fusion

Anti-Noise 3D Object Detection of Multimodal Feature Attention Fusion Based on PV-RCNN

Multi-scale Feature Fusion with Point Pyramid for 3D Object Detection

AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection

Multiattention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving

GA-RCNN:Graph self-attention feature extraction for 3D object detection

Multi-modal 3D object detection by 2D-guided precision anchor proposal and multi-layer fusion

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection

Improving 3D Object Detection with Context-Aware and Dimensional Interaction Attention

Channelwise and Spatially Guided Multimodal Feature Fusion Network for 3-D Object Detection in Autonomous Vehicles

ACF-Net: Asymmetric Cascade Fusion for 3D Detection with LiDAR Point Clouds and Images

Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement

From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion

HCNET: A Point Cloud Object Detection Network Based on Height and Channel Attention