Abstract:Autonomous driving necessitates advanced object detection techniques that integrate information from multiple modalities to overcome the limitations associated with single-modal approaches. The challenges of aligning diverse data in early fusion and the complexities, along with overfitting issues introduced by deep fusion, underscore the efficacy of late fusion at the decision level. Late fusion ensures seamless integration without altering the original detector's network structure. This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection. Fusion experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements, presenting our model as a versatile solution for multi-modal object detection in autonomous driving. Moreover, our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy and providing more reliable insights into category predictions.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the challenges of multi-modal object detection in the field of autonomous driving. Specifically, the paper proposes a Multi-modal Multi-class Late Fusion (MMLF) method to overcome the inaccuracies of single-modal methods in complex dynamic scenes. The paper mainly focuses on the following aspects: 1. **Multi-modal Data Fusion**: - **Early Fusion**: Early fusion methods face challenges in data alignment and require precise data preprocessing and normalization. - **Deep Fusion**: Deep fusion methods introduce more parameters and computational complexity, leading to overfitting and increased training time. - **Late Fusion**: Late fusion operates at the decision level, ensuring seamless integration without altering the network structure of the original detectors, thereby enhancing flexibility and adaptability. 2. **Uncertainty Estimation**: - Traditional deep learning methods lack confidence in predictions, especially in critical applications like ensuring the safety of autonomous driving. - The paper integrates uncertainty analysis into the classification fusion process, improving the model's transparency and reliability, providing more dependable class predictions. 3. **Multi-class Detection**: - The proposed method is not only suitable for multi-modal data fusion but also capable of handling multi-class detection tasks, enhancing the model's generality and robustness. ### Main Contributions - **Flexible 2D and 3D Detector Fusion**: Capable of seamlessly integrating various 2D and 3D detectors as long as they produce detection results of the same class. - **Late Fusion of Multi-class Features**: Optimized the Trusted Multi-View Classification (TMC) method by matching 3D and 2D candidate pairs through non-zero Intersection Over Union (IOU), achieving late fusion of multi-class features. - **Uncertainty-aware Class Fusion Analysis**: Quantified the uncertainty of classification results and effectively reduced uncertainty through the fusion process. ### Experimental Results - **KITTI Validation Set**: The fusion methods (Fuse-v3 and Fuse-v4) showed significant improvements over baseline methods (Ori-v3 and Ori-v4) across multiple metrics, including 2D bounding box detection, 2D orientation detection, Bird's Eye View (BEV) detection, and 3D bounding box detection, with improvements ranging from +0.18% to +19.57%. - **KITTI Test Set**: The fusion methods also performed excellently on the test set across the same metrics, with improvements ranging from +0.92% to +16.72%. ### Uncertainty Analysis - **Quantitative Results**: By calculating uncertainty scores, the paper demonstrated the effectiveness of the fusion methods in reducing uncertainty. For example, for the car category, the average uncertainty scores of the original methods (Ori-v3 and Ori-v4) were 0.11827 and 0.11881, respectively, while the fusion methods (Fuse-v3 and Fuse-v4) reduced the average uncertainty scores to 0.02692 and 0.02737, respectively. - **Qualitative Results**: Through visualized results, the paper showed that false detections with high uncertainty scores were successfully filtered out, further improving detection reliability. In summary, this paper effectively addresses multiple challenges in multi-modal object detection by proposing the MMLF method, achieving significant advancements, particularly in uncertainty and multi-class detection.

MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation

3DMMF: 3D Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

mmFUSION: Multimodal Fusion for 3D Objects Detection

MLF3D: Multi-Level Fusion for Multi-Modal 3D Object Detection

Multi-spectral Image Fusion for Moving Object Detection

Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel

Rethinking the Late Fusion of LiDAR-Camera Based 3D Object Detection

Multi-Modal 3D Object Detection by Box Matching

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

MMFusion: A Generalized Multi-Modal Fusion Detection Framework

Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection

Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection

MMAF-Net: Multi-view multi-stage adaptive fusion for multi-sensor 3D object detection

Enhancing 3D object detection through multi-modal fusion for cooperative perception

A Generalized Multi-Modal Fusion Detection Framework

Fusion Strategy of Multi-sensor Based Object Detection for Self-driving Vehicles.

Multi-Modal and Multi-Scale Fusion 3D Object Detection of 4D Radar and LiDAR for Autonomous Driving

Multi-Sem Fusion: Multimodal Semantic Fusion for 3-D Object Detection

MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

Frustum FusionNet: Amodal 3D Object Detection with Multi-Modal Feature Fusion