Abstract:With the advent of autonomous vehicle applications, the importance of LiDAR point cloud 3D object detection cannot be overstated. Recent studies have demonstrated that methods for aggregating features from voxels can accurately and efficiently detect objects in large, complex 3D detection scenes. Nevertheless, most of these methods do not filter background points well and have inferior detection performance for small objects. To ameliorate this issue, this paper proposes an Attention-based and Multiscale Feature Fusion Network (AMFF-Net), which utilizes a Dual-Attention Voxel Feature Extractor (DA-VFE) and a Multi-scale Feature Fusion (MFF) Module to improve the precision and efficiency of 3D object detection. The DA-VFE considers pointwise and channelwise attention and integrates them into the Voxel Feature Extractor (VFE) to enhance key point cloud information in voxels and refine more-representative voxel features. The MFF Module consists of self-calibrated convolutions, a residual structure, and a coordinate attention mechanism, which acts as a 2D Backbone to expand the receptive domain and capture more contextual information, thus better capturing small object locations, enhancing the feature-extraction capability of the network and reducing the computational overhead. We performed evaluations of the proposed model on the nuScenes dataset with a large number of driving scenarios. The experimental results showed that the AMFF-Net achieved 62.8% in the mAP, which significantly boosted the performance of small object detection compared to the baseline network and significantly reduced the computational overhead, while the inference speed remained essentially the same. AMFF-Net also achieved advanced performance on the KITTI dataset.

VSL-Net: Voxel structure learning for 3D object detection

SSF: Sparse Point Cloud Object Detection Based on Self-Adaptive Voxel Encoding and Focal-Sparse Convolution

AVFP-MVX: Multimodal VoxelNet with Attention Mechanism and Voxel Feature Pyramid

ObjectFusion: an Object Detection and Segmentation Framework with RGB-D SLAM and Convolutional Neural Networks

F-PVNet: Frustum-Level 3-D Object Detection on Point–Voxel Feature Representation for Autonomous Driving

MSPV3D: Multi-Scale Point-Voxels 3D Object Detection Net

SIEV-Net: A Structure-Information Enhanced Voxel Network for 3D Object Detection From LiDAR Point Clouds

DVFENet: Dual-branch voxel feature extraction network for 3D object detection

VP-Net: Voxels as Points for 3D Object Detection

SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection from Point Clouds

Multiclass objects detection algorithm using DarkNet-53 and DenseNet for intelligent vehicles

SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud

Semantic-aware 3D-voxel CenterNet for point cloud object detection

Region-proposal Convolutional Network-driven Point Cloud Voxelization and Over-segmentation for 3D Object Detection

VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

VIN: Voxel-based Implicit Network for Joint 3D Object Detection and Segmentation for Lidars

Spatial Information Enhancement with Multi-Scale Feature Aggregation for Long-Range Object and Small Reflective Area Object Detection from Point Cloud

AMFF-Net: An Effective 3D Object Detector Based on Attention and Multi-Scale Feature Fusion

3D Object Detection under Urban Road Traffic Scenarios Based on Dual-Layer Voxel Features Fusion Augmentation

SDVRF: Sparse-to-Dense Voxel Region Fusion for Multi-modal 3D Object Detection