Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System

Daniel Dworak,Mateusz Komorkiewicz,Paweł Skruch,Jerzy Baranowski
2024-04-25
Abstract:In this paper, we propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems. Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance. Precisely, we extract 2D features from camera images using a state-of-the-art deep learning architecture and then apply a novel Cross-Domain Spatial Matching (CDSM) transformation method to convert these features into 3D space. We then fuse them with extracted radar data using a complementary fusion strategy to produce a final 3D object representation. To demonstrate the effectiveness of our approach, we evaluate it on the NuScenes dataset. We compare our approach to both single-sensor performance and current state-of-the-art fusion methods. Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to effectively fuse camera and radar sensor data in the autonomous vehicle perception system to improve the performance of 3D object detection**. Specifically, the paper proposes a new low - level fusion method for fusing data from camera images and radar point clouds. Through this method, the advantages of both sensors can be fully utilized, thereby improving the accuracy and robustness of object detection. The following are the main contributions of the paper: 1. **New low - level fusion method**: A projection - less method based on tensor - orientation matching, called **Cross - Domain Spatial Matching (CDSM)**, is proposed for fusing camera and radar data in the neural network structure. 2. **Lightweight solution**: This method is not only competitive but also computationally efficient, and can reduce the consumption of computational resources while maintaining high performance. 3. **Multi - view processing architecture**: A multi - view processing architecture is adopted, which uses a single - stage network to process camera images and radar point cloud data respectively, and aligns and fuses these feature maps in 3D space through the CDSM module. 4. **Experimental verification**: Experiments were carried out on the NuScenes dataset to verify the effectiveness of this method, and it was compared with existing single - sensor methods and other top - level fusion methods, showing its superior performance. Through these innovations, the paper aims to provide a more efficient and reliable object detection method for the autonomous vehicle perception system, especially in complex and dynamic traffic environments.