Abstract:Object detection (OD) is a basic computer vision task. To date, there have been many OD algorithms or models for solving different problems. The performance of the current models has gradually improved and their applications have expanded. However, the models have also become more complex, with larger numbers of parameters, making them unsuitable for industrial applications. The knowledge distillation (KD) technology proposed in 2015 was first applied to image classification in the field of computer vision, and quickly expanded to other visual tasks. The reason for this may be that the complex teacher models can transfer knowledge (learned from large-scale data or other multi-modal data) to lightweight student models, thereby achieving model compression and performance improvement. Although KD was only introduced into OD in 2017, recent years have seen a surge in publication of related works, especially in 2021 and 2022. Therefore, this paper presents a comprehensive survey of KD-based OD models over recent years, in the hope of providing researchers with an overview of recent progress in the field. Moreover, we have conducted in-depth analysis of the existing relevant works to ascertain their advantages and related issues, and further explored future research directions, in an attempt to provide researchers with inspiration and incentive to design models for related tasks. In brief, we summarize the basic principle of designing KD-based OD models, describe related KD-based OD tasks (performance improvements for lightweight models, catastrophic forgetting in incremental OD, small object detection (S-OD), weakly/semi-supervised OD, etc.), analyze the novel distillation techniques (different types of distillation loss, the feature interaction between teacher and student models, KD of multi-modal prior information, joint distillation using multiple teacher models, self-feature distillation, etc.), and present an overview of the extended applications on several specific datasets (remote sensing images, 3D point cloud datasets, etc.). After comparing and analyzing the performance of different models on several common datasets, we discuss promising directions for solving some specific OD problems.

X$^3$KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

Towards Efficient 3D Object Detection with Knowledge Distillation

CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

Structured Knowledge Distillation Towards Efficient and Compact Multi-View 3D Detection

UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View

Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection

LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection

Representation Disparity-aware Distillation for 3D Object Detection

BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for Multi-View BEV 3D Object Detection

CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection

InstKD: Towards Lightweight 3D Object Detection With Instance-Aware Knowledge Distillation

BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection

Attention-Based Depth Distillation with 3D-Aware Positional Encoding for Monocular 3D Object Detection

MKD-Cooper: Cooperative 3D Object Detection for Autonomous Driving via Multi-Teacher Knowledge Distillation

Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation

Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection

When Object Detection Meets Knowledge Distillation: A Survey

Weak-to-Strong 3D Object Detection with X-Ray Distillation

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

MonoSKD: General Distillation Framework for Monocular 3D Object Detection via Spearman Correlation Coefficient