ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection

Weijia Zhang,Dongnan Liu,Chao Ma,Weidong Cai

2023-11-07

Abstract:Monocular 3D object detection (M3OD) is a significant yet inherently challenging task in autonomous driving due to absence of explicit depth cues in a single RGB image. In this paper, we strive to boost currently underperforming monocular 3D object detectors by leveraging an abundance of unlabelled data via semi-supervised learning. Our proposed ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training. By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points to enable more foreground-attentive and efficient distillation via the proposed BEV occupancy guidance mask, leading to notably improved knowledge transfer and M3OD performance. Besides, motivated by insights into why existing cross-modal GT-sampling techniques fail on our task at hand, we further design a novel cross-modal object-wise data augmentation strategy for effective RGB-LiDAR joint learning. Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised, on both BEV and 3D detection metrics.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the performance deficiency in monocular 3D object detection (M3OD) due to the lack of explicit depth cues. Specifically, the paper points out that current methods perform poorly in dealing with the foreground sparsity problem, which leads to insufficient training signals and the suppression of foreground signals by background noise. These problems make it difficult for monocular 3D object detection to achieve performance comparable to methods based on LiDAR or stereo images in applications such as autonomous driving. To alleviate these problems, the paper proposes a framework named ODM3D (Occupancy - Guided Distillation for Monocular 3D Object Detection), which improves the performance of monocular 3D object detection in the following ways: 1. **Occupancy - Guided Cross - Modal Knowledge Distillation**: - Use the positioning information in the LiDAR point cloud as guidance to carry out knowledge transfer that focuses more on the foreground area. By generating a BEV (Bird - Eye - View) occupancy mask, guide the feature distillation and response distillation processes, enabling the student model to learn the 3D perception ability of the teacher model more effectively. 2. **Cross - Modal Data Augmentation Strategy (CMAug)**: - Design a new occlusion - aware intersection score (OAIS) to avoid severe occlusion problems. In addition, introduce a pseudo - label - based collision detection method for the scenario of unlabeled data to ensure that the augmented data is more effective in training. 3. **Performance Improvement**: - Through the above methods, ODM3D has achieved the best 3D and BEV detection performance on both the KITTI validation set and the test set, significantly surpassing existing supervised and semi - supervised monocular 3D object detection methods. In summary, the main objective of the paper is to solve the foreground sparsity problem in monocular 3D object detection through cross - modal knowledge distillation and data augmentation techniques, thereby improving the detection performance of the model.

ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection

Leveraging Front and Side Cues for Occlusion Handling in Monocular 3D Object Detection

FD3D: Exploiting Foreground Depth Map for Feature-Supervised Monocular 3D Object Detection

SGM3D: Stereo Guided Monocular 3D Object Detection

Correction: Structure-Activity Relationships of Constrained Phenylethylamine Ligands for the Serotonin 5-HT2 Receptors

Learning Occupancy for Monocular 3D Object Detection

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data

WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection

MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection

Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection.

MVM3Det: A Novel Method for Multi-view Monocular 3D Detection

Pyogenic granuloma of the small bowel

Weakly Supervised Monocular 3D Detection with a Single-View Image

SSC3OD: Sparsely Supervised Collaborative 3D Object Detection from LiDAR Point Clouds

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

Monocular 3D Detection With Geometric Constraint Embedding and Semi-Supervised Training

MonoAux: Fully Exploiting Auxiliary Information and Uncertainty for Monocular 3D Object Detection

Delving into the Pre-training Paradigm of Monocular 3D Object Detection

BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection