Abstract:Transformers have revolutionized the object detection landscape by introducing DETRs, acclaimed for their simplicity and efficacy. Despite their advantages, the substantial size of these models poses significant challenges for practical deployment, particularly in resource-constrained environments. This paper addresses the challenge of compressing DETR by leveraging knowledge distillation, a technique that holds promise for maintaining model performance while reducing size. A critical aspect of DETRs' performance is their reliance on queries to interpret object representations accurately. Traditional distillation methods often focus exclusively on positive queries, identified through bipartite matching, neglecting the rich information present in hard-negative queries. Our visual analysis indicates that hard-negative queries, focusing on foreground elements, are crucial for enhancing distillation outcomes. To this end, we introduce a novel Group Query Selection strategy, which diverges from traditional query selection in DETR distillation by segmenting queries based on their Generalized Intersection over Union (GIoU) with ground truth objects, thereby uncovering valuable hard-negative queries for distillation. Furthermore, we present the Knowledge Distillation via Query Selection for DETR (QSKD) framework, which incorporates Attention-Guided Feature Distillation (AGFD) and Local Alignment Prediction Distillation (LAPD). These components optimize the distillation process by focusing on the most informative aspects of the teacher model's intermediate features and output. Our comprehensive experimental evaluation of the MS-COCO dataset demonstrates the effectiveness of our approach, significantly improving average precision (AP) across various DETR architectures without incurring substantial computational costs. Specifically, the AP of Conditional DETR ResNet-18 increased from 35.8 to 39.9.

DETRDistill: A Universal Knowledge Distillation Framework for DETR-families.

DISTILLING DETR-LIKE DETECTORS WITH INSTANCE-AWARE FEATURE

Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Knowledge Distillation via Query Selection for Detection Transformer

D$^3$ETR: Decoder Distillation for Detection Transformer

Research on Knowledge Distillation Algorithm of Object Detection

OD-DETR: Online Distillation for Stabilizing Training of Detection Transformer

DFD: Distillng the Feature Disparity Differently for Detectors

Focal and Global Knowledge Distillation for Detectors

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Distilling Object Detectors with Global Knowledge

Distilling Object Detectors via Decoupled Features

PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-Guided Feature Imitation

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

Distilling object detectors with efficient logit mimicking and mask-guided feature imitation

Hands-on Guidance for Distilling Object Detectors

Distilling Object Detectors With Fine-Grained Feature Imitation

IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors

CrossKD: Cross-Head Knowledge Distillation for Object Detection

UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors