LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

Sanmin Kim,Youngseok Kim,Sihwan Hwang,Hyeonjun Jeong,Dongsuk Kum

2024-07-14

Abstract:Recent advancements in camera-based 3D object detection have introduced cross-modal knowledge distillation to bridge the performance gap with LiDAR 3D detectors, leveraging the precise geometric information in LiDAR point clouds. However, existing cross-modal knowledge distillation methods tend to overlook the inherent imperfections of LiDAR, such as the ambiguity of measurements on distant or occluded objects, which should not be transferred to the image detector. To mitigate these imperfections in LiDAR teacher, we propose a novel method that leverages aleatoric uncertainty-free features from ground truth labels. In contrast to conventional label guidance approaches, we approximate the inverse function of the teacher's head to effectively embed label inputs into feature space. This approach provides additional accurate guidance alongside LiDAR teacher, thereby boosting the performance of the image detector. Additionally, we introduce feature partitioning, which effectively transfers knowledge from the teacher modality while preserving the distinctive features of the student, thereby maximizing the potential of both modalities. Experimental results demonstrate that our approach improves mAP and NDS by 5.1 points and 4.9 points compared to the baseline model, proving the effectiveness of our approach. The code is available at <a class="link-external link-https" href="https://github.com/sanmin0312/LabelDistill" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper primarily addresses the performance gap between camera-based 3D object detection and LiDAR detectors. Specifically, the paper proposes a new method called LabelDistill, a label-guided cross-modal knowledge distillation approach aimed at improving the performance of 3D object detectors that use only images. The paper points out that existing cross-modal knowledge distillation methods often overlook the inherent limitations of LiDAR data, such as the uncertainty in measuring distant or occluded objects. These limitations can lead to imperfect features being transferred from the LiDAR teacher model to the image detector, thereby affecting the performance of the image detector. To mitigate these issues, the paper proposes a new approach: 1. **Label Distillation**: Utilize real label information to supplement the deficiencies of LiDAR data. This method effectively embeds label information into the feature space by approximating the inverse function of the LiDAR detection head, thereby providing more accurate guidance. 2. **Feature Partitioning**: During the distillation process, divide the image features into different groups to retain the characteristics of the image itself while learning useful information from both LiDAR and labels. This maximizes the advantages of both modalities. Through the above methods, LabelDistill can effectively enhance the performance of image detectors. Experimental results show that compared to the baseline model, LabelDistill significantly improves metrics such as mean Average Precision (mAP) and nuScenes Detection Score (NDS) on the nuScenes dataset, demonstrating the effectiveness of the method.

LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

Research on Knowledge Distillation Algorithm of Object Detection

UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

X$^3$KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

Distilling Object Detectors with Global Knowledge

Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection

Structured Knowledge Distillation for Accurate and Efficient Object Detection

PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection

Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Towards Efficient 3D Object Detection with Knowledge Distillation

InstKD: Towards Lightweight 3D Object Detection With Instance-Aware Knowledge Distillation

Attention-Based Depth Distillation with 3D-Aware Positional Encoding for Monocular 3D Object Detection

Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection

Focal Distillation from High-Resolution Data to Low-Resolution Data for 3D Object Detection

Selective Transfer Learning of Cross-Modality Distillation for Monocular 3D Object Detection

BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection