MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation

Qihang Yang,Yang Zhao,Hong Cheng
2024-10-11
Abstract:Autonomous driving necessitates advanced object detection techniques that integrate information from multiple modalities to overcome the limitations associated with single-modal approaches. The challenges of aligning diverse data in early fusion and the complexities, along with overfitting issues introduced by deep fusion, underscore the efficacy of late fusion at the decision level. Late fusion ensures seamless integration without altering the original detector's network structure. This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection. Fusion experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements, presenting our model as a versatile solution for multi-modal object detection in autonomous driving. Moreover, our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy and providing more reliable insights into category predictions.
Computer Vision and Pattern Recognition,Systems and Control
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the challenges of multi-modal object detection in the field of autonomous driving. Specifically, the paper proposes a Multi-modal Multi-class Late Fusion (MMLF) method to overcome the inaccuracies of single-modal methods in complex dynamic scenes. The paper mainly focuses on the following aspects: 1. **Multi-modal Data Fusion**: - **Early Fusion**: Early fusion methods face challenges in data alignment and require precise data preprocessing and normalization. - **Deep Fusion**: Deep fusion methods introduce more parameters and computational complexity, leading to overfitting and increased training time. - **Late Fusion**: Late fusion operates at the decision level, ensuring seamless integration without altering the network structure of the original detectors, thereby enhancing flexibility and adaptability. 2. **Uncertainty Estimation**: - Traditional deep learning methods lack confidence in predictions, especially in critical applications like ensuring the safety of autonomous driving. - The paper integrates uncertainty analysis into the classification fusion process, improving the model's transparency and reliability, providing more dependable class predictions. 3. **Multi-class Detection**: - The proposed method is not only suitable for multi-modal data fusion but also capable of handling multi-class detection tasks, enhancing the model's generality and robustness. ### Main Contributions - **Flexible 2D and 3D Detector Fusion**: Capable of seamlessly integrating various 2D and 3D detectors as long as they produce detection results of the same class. - **Late Fusion of Multi-class Features**: Optimized the Trusted Multi-View Classification (TMC) method by matching 3D and 2D candidate pairs through non-zero Intersection Over Union (IOU), achieving late fusion of multi-class features. - **Uncertainty-aware Class Fusion Analysis**: Quantified the uncertainty of classification results and effectively reduced uncertainty through the fusion process. ### Experimental Results - **KITTI Validation Set**: The fusion methods (Fuse-v3 and Fuse-v4) showed significant improvements over baseline methods (Ori-v3 and Ori-v4) across multiple metrics, including 2D bounding box detection, 2D orientation detection, Bird's Eye View (BEV) detection, and 3D bounding box detection, with improvements ranging from +0.18% to +19.57%. - **KITTI Test Set**: The fusion methods also performed excellently on the test set across the same metrics, with improvements ranging from +0.92% to +16.72%. ### Uncertainty Analysis - **Quantitative Results**: By calculating uncertainty scores, the paper demonstrated the effectiveness of the fusion methods in reducing uncertainty. For example, for the car category, the average uncertainty scores of the original methods (Ori-v3 and Ori-v4) were 0.11827 and 0.11881, respectively, while the fusion methods (Fuse-v3 and Fuse-v4) reduced the average uncertainty scores to 0.02692 and 0.02737, respectively. - **Qualitative Results**: Through visualized results, the paper showed that false detections with high uncertainty scores were successfully filtered out, further improving detection reliability. In summary, this paper effectively addresses multiple challenges in multi-modal object detection by proposing the MMLF method, achieving significant advancements, particularly in uncertainty and multi-class detection.