Abstract:LIDAR and camera fusion techniques are promising for achieving 3-D object detection in autonomous driving (AD). Most multimodal 3-D object detection frameworks integrate semantic knowledge from 2-D images into 3-D LiDAR point clouds to enhance detection accuracy. Nevertheless, the restricted resolution of 2-D feature maps impedes accurate reprojection and often induces a pronounced boundary-blurring effect, which is primarily attributed to erroneous semantic segmentation. To address these limitations, we present the multi-sem fusion (MSF) framework, a versatile multimodal fusion approach that employs 2-D/3-D semantic segmentation methods to generate parsing results for both modalities. Subsequently, the 2-D semantic information undergoes reprojection into 3-D point clouds utilizing calibration parameters. To tackle misalignment challenges between the 2-D and 3-D parsing results, we introduce an adaptive attention-based fusion (AAF) module to fuse them by learning an adaptive fusion score. Then, the point cloud with the fused semantic label is sent to the following 3-D object detectors. Furthermore, we propose a deep feature fusion (DFF) module to aggregate deep features at different levels to boost the final detection performance. The effectiveness of the framework has been verified on two public large-scale 3-D object detection benchmarks by comparing them with different baselines. And the experimental results show that the proposed fusion strategies can significantly improve the detection performance compared to the methods using only point clouds and the methods using only 2-D semantic information. Moreover, our approach seamlessly integrates as a plug-in within any detection framework.

A Data Augmentation Method Based on Multi-Modal Image Fusion for Detection and Segmentation

Exploring Data Augmentation for Multi-Modality 3D Object Detection

Data augmentation for deep visual recognition using superpixel based pairwise image fusion

Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection

Image Data Augmentation for Deep Learning: A Survey

Cross-Modal Data Augmentation for Tasks of Different Modalities

A Simple Background Augmentation Method for Object Detection with Diffusion Model

Multimodal Data Augmentation for Image Captioning using Diffusion Models

Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection

Multi-Sem Fusion: Multimodal Semantic Fusion for 3-D Object Detection

Enhancing target detection accuracy through cross-modal spatial perception and dual-modality fusion

MixGen: A New Multi-Modal Data Augmentation

Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation

StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation

E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection

DualAug: Exploiting Additional Heavy Augmentation with OOD Data Rejection

An object detection algorithm based on infrared-visible dual modal feature fusion

A Comprehensive Survey on Data Augmentation