Abstract:A voxel‐based single‐shot multi‐model network for 3D object detection is introduced, namely AVIFF. The authors made some new attempts in fusing features of point cloud and image by designing the adaptive feature fusion (AFF) module and dense fusion (DF) module. Besides, the authors introduced GIoU loss into 3D space to increase localisation and rotation perception capabilities to the authors' framework. The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge of effectively combining the complementary properties of multi‐sensor data looms large. This work presents a new approach to multi‐model 3D object detection, called adaptive voxel‐image feature fusion (AVIFF). Adaptive voxel‐image feature fusion is an end‐to‐end single‐shot framework that can dynamically and adaptively fuse point cloud and image features, resulting in a more comprehensive and integrated analysis of the camera sensor and the LiDar sensor data. With the aid of the adaptive feature fusion module, spatialised image features can be adroitly fused with voxel‐based point cloud features, while the Dense Fusion module ensures the preservation of the distinctive characteristics of 3D point cloud data through the use of a heterogeneous architecture. Notably, the authors' framework features a novel generalised intersection over union loss function that enhances the perceptibility of object localsation and rotation in 3D space. Comprehensive experimentation has validated the efficacy of the authors' proposed modules, firmly establishing AVIFF as a novel framework in the field of 3D object detection.

Voxel field fusion for 3d object detection

VoxelNextFusion: A Simple, Unified and Effective Voxel Fusion Framework for Multi-Modal 3D Object Detection

Dense Voxel Fusion for 3D Object Detection

MVX-Net: Multimodal VoxelNet for 3D Object Detection

SDVRF: Sparse-to-Dense Voxel Region Fusion for Multi-modal 3D Object Detection

End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds

Point-Voxel Fusion for Multimodal 3D Detection

3D Object Detection under Urban Road Traffic Scenarios Based on Dual-Layer Voxel Features Fusion Augmentation

A novel multi‐model 3D object detection framework with adaptive voxel‐image feature fusion

DLFusion: Painting-Depth Augmenting-LiDAR for Multimodal Fusion 3D Object Detection

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection.

Frustum FusionNet: Amodal 3D Object Detection with Multi-Modal Feature Fusion

Cascaded Cross-Modality Fusion Network for 3D Object Detection

Cross-Modality 3D Object Detection

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection

AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection

PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection

CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection

Multi-Sem Fusion: Multimodal Semantic Fusion for 3-D Object Detection

DVFENet: Dual-branch voxel feature extraction network for 3D object detection