Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework

Eliraz Orfaig,Inna Stainvas,Igal Bilik

2024-06-05

Abstract:Vision-based autonomous driving requires reliable and efficient object detection. This work proposes a DiffusionDet-based framework that exploits data fusion from the monocular camera and depth sensor to provide the RGB and depth (RGB-D) data. Within this framework, ground truth bounding boxes are randomly reshaped as part of the training phase, allowing the model to learn the reverse diffusion process of noise addition. The system methodically enhances a randomly generated set of boxes at the inference stage, guiding them toward accurate final detections. By integrating the textural and color features from RGB images with the spatial depth information from the LiDAR sensors, the proposed framework employs a feature fusion that substantially enhances object detection of automotive targets. The $2.3$ AP gain in detecting automotive targets is achieved through comprehensive experiments using the KITTI dataset. Specifically, the improved performance of the proposed approach in detecting small objects is demonstrated.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to improve the object detection performance of autonomous vehicles by integrating RGB images and depth sensor data. Specifically, the paper proposes a method based on the DiffusionDet framework, which leverages RGB and depth (RGB-D) data provided by a monocular camera and a depth sensor to enhance object detection capabilities. The main contributions of the paper are as follows: 1. **FusedDiffusionDet Architecture**: Extends the DiffusionDet model to handle multi-modal RGB-D data by introducing noise into the diffusion process of bounding boxes, gradually optimizing the randomly generated bounding box positions and sizes until they perfectly cover the target objects. 2. **Feature Fusion Study**: Conducts ablation experiments on various feature fusion architectures to evaluate the performance of different fusion strategies in object detection. 3. **Efficient Training**: Effectively trains the FusedDiffusionDet network using only refinement operations. Experiments on the KITTI dataset validate that this method achieves significant performance improvements in detecting small objects (such as pedestrians) by 3.7% and also shows improvements in detecting large objects (such as vans) by 2.9%. These improvements indicate that integrating RGB images and depth information can significantly enhance the robustness and accuracy of object detection in complex urban environments.

Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework

3DifFusionDet: Diffusion Model for 3D Object Detection with Robust LiDAR-Camera Fusion

RGB-LiDAR fusion for accurate 2D and 3D object detection

Real time object detection using LiDAR and camera fusion for autonomous driving

Real-Time Vehicle Detection Framework Based on the Fusion of LiDAR and Camera

DyFusion: Cross-Attention 3D Object Detection with Dynamic Fusion

RI-Fusion: 3D Object Detection Using Enhanced Point Features With Range-Image Fusion for Autonomous Driving.

RangeLVDet: Boosting 3D Object Detection in LIDAR With Range Image and RGB Image

Object Detection Using Multi-Sensor Fusion Based on Deep Learning

Enhancing 3D object detection through multi-modal fusion for cooperative perception

Object detection using depth completion and camera-LiDAR fusion for autonomous driving

Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer

CrossFusion net: Deep 3D object detection based on RGB images and point clouds in autonomous driving

Deep multi-scale and multi-modal fusion for 3D object detection

Fast vehicle detection based on colored point cloud with bird's eye view representation

DHA: Lidar and Vision Data Fusion-based on Road Object Classifier

Fast vehicle detection based on colored point cloud with bird’s eye view representation

Deep LiDAR-Radar-Visual Fusion for Object Detection in Urban Environments

Robust Dual-Modal Image Quality Assessment Aware Deep Learning Network for Traffic Targets Detection of Autonomous Vehicles.

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

RGB Camera and LiDAR Fusion for Road Detection