Abstract:Recently, 3D object detection technology based on point clouds has developed rapidly. However, too few points of distant and occluded objects are scanned by the sensor, and thus these objects suffer from too insufficient features to be detected. This case damages the detection accuracy. Therefore, we constitute a novel 3D object detection with Context-aware and dimensional Interaction Attention Network (CIANet) to explore vital geometric cues for enriching the feature representation of the object, thus boosting the overall detection performance. Specifically, in the first stage, we employ the 3D sparse convolution to extract voxel features, and then construct a Channel-Spatial Hybrid Attention (CSHA) module and a Contextual Self-Attention (CSA) module to enhance voxel features for generating proposals. The CSHA module aims to enhance the key information of the channel and spatial domains of 2D Bird's Eye View (BEV) features, and the CSA module is applied to supplement contextual information to the enhanced BEV features, thus generating accurate proposals. In the second stage, we construct a Dimensional Interaction Attention (DIA) module to refine Region of Interest (RoI) features within the proposals. It enhances the interactions among the channel and spatial dimensions of RoI features to learn accurate boundaries of objects for proposal refinement. Extensive experiments on the KITTI and Waymo benchmarks show the superior detection performance of CIANet compared to recent methods, especially for objects such as pedestrians and cyclists.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of insufficient features due to too few scanning points for far - distance and occluded objects in point - cloud - based 3D object detection technology. Specifically: 1. **Detection challenges of far - distance and occluded objects**: - In real - world scenarios, the number of points of far - distance or occluded objects scanned by LiDAR sensors is too small, making it difficult to fully depict the boundaries of these objects and thus lacking sufficient spatial features. - This situation makes existing 3D object detection methods difficult to accurately detect these weak objects, thereby affecting the overall detection performance. 2. **Limitations of existing methods**: - **View - based methods**: By projecting the point cloud onto a 2D view for detection, although it can utilize mature 2D CNNs, it will lose 3D spatial information, limiting the detection performance. - **Point - based methods**: Directly extract features from the original point cloud, but with high computational complexity and slow inference speed. - **Voxel - based methods**: Convert the point cloud into regular voxels and use 3D sparse convolution to extract features. Although it improves the computational speed, it will lose spatial geometric information in some cases, affecting the detection accuracy. ### Solutions proposed in the paper To solve the above problems, the authors propose a new 3D object detection network based on context - aware and dimensional interaction attention mechanisms - CIANet (Context - aware and Dimensional Interaction Attention Network). Specific improvement measures include: 1. **First stage**: - Use 3D sparse convolution to extract voxel features. - Construct a **Channel - Spatial Hybrid Attention module (CSHA)** to enhance the key information of BEV features. - Construct a **Context Self - Attention module (CSA)** to supplement global spatial context information and generate high - quality candidate boxes (proposals). 2. **Second stage**: - Use voxel RoI pooling operations to capture RoI features within the candidate boxes. - Construct a **Dimensional Interaction Attention module (DIA)** to enhance the interaction between the spatial and channel dimensions of RoI features, learn more accurate object boundaries, and thus refine the candidate boxes. ### Main contributions 1. **Proposing CSHA and CSA modules**: In the first stage, these two modules enhance the key channel - spatial features and aggregate rich global context information to generate more accurate candidate boxes. 2. **Designing DIA module**: In the second stage, this module integrates the interaction between the channel dimension and the spatial dimension, enhances the RoI features, and further refines the candidate boxes to generate the final accurate detection boxes. 3. **Experimental results**: CIANet performs excellently in the KITTI and Waymo benchmarks, especially outperforming other advanced methods in detecting small weak objects such as pedestrians and cyclists. By introducing these attention mechanisms, CIANet can better focus on the boundary information of weak objects in real - world scenarios, thereby improving the overall detection performance.

Improving 3D Object Detection with Context-Aware and Dimensional Interaction Attention

SIANet: 3D object detection with structural information augment network

Context-Aware Dynamic Feature Extraction for 3D Object Detection in Point Clouds

AGO-Net: Association-Guided 3D Point Cloud Object Detection Network

SIENet: Spatial Information Enhancement Network for 3D Object Detection from Point Cloud

Improving 3D Object Detection with Channel-wise Transformer

Cascaded Cross-Modality Fusion Network for 3D Object Detection

3D Object Detection Based on Attention and Multi-Scale Feature Fusion

S-AT GCN: Spatial-Attention Graph Convolution Network based Feature Enhancement for 3D Object Detection

Semantic-aware 3D-voxel CenterNet for point cloud object detection

CAF-RCNN: multimodal 3D object detection with cross-attention

Channelwise and Spatially Guided Multimodal Feature Fusion Network for 3-D Object Detection in Autonomous Vehicles

CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery

Scanet: Spatial-Channel Attention Network For 3d Object Detection

Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement

PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module

3D Object Detection with Attention: Shell-Based Modeling

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention

3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds

HCNET: A Point Cloud Object Detection Network Based on Height and Channel Attention

CI3D: Context Interaction for Dynamic Objects and Static Map Elements in 3D Driving Scenes.