Abstract:Transformers have revolutionized the object detection landscape by introducing DETRs, acclaimed for their simplicity and efficacy. Despite their advantages, the substantial size of these models poses significant challenges for practical deployment, particularly in resource-constrained environments. This paper addresses the challenge of compressing DETR by leveraging knowledge distillation, a technique that holds promise for maintaining model performance while reducing size. A critical aspect of DETRs' performance is their reliance on queries to interpret object representations accurately. Traditional distillation methods often focus exclusively on positive queries, identified through bipartite matching, neglecting the rich information present in hard-negative queries. Our visual analysis indicates that hard-negative queries, focusing on foreground elements, are crucial for enhancing distillation outcomes. To this end, we introduce a novel Group Query Selection strategy, which diverges from traditional query selection in DETR distillation by segmenting queries based on their Generalized Intersection over Union (GIoU) with ground truth objects, thereby uncovering valuable hard-negative queries for distillation. Furthermore, we present the Knowledge Distillation via Query Selection for DETR (QSKD) framework, which incorporates Attention-Guided Feature Distillation (AGFD) and Local Alignment Prediction Distillation (LAPD). These components optimize the distillation process by focusing on the most informative aspects of the teacher model's intermediate features and output. Our comprehensive experimental evaluation of the MS-COCO dataset demonstrates the effectiveness of our approach, significantly improving average precision (AP) across various DETR architectures without incurring substantial computational costs. Specifically, the AP of Conditional DETR ResNet-18 increased from 35.8 to 39.9.

Continual Detection Transformer for Incremental Object Detection

Dynamic Object Queries for Transformer-based Incremental Object Detection

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation

RD-IOD: Two-Level Residual-Distillation-Based Triple-Network for Incremental Object Detection

Revisiting Class-Incremental Object Detection: an Efficient Approach Via Intrinsic Characteristics Alignment and Task Decoupling

Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Knowledge Distillation via Query Selection for Detection Transformer

Incremental Detection of Remote Sensing Objects with Feature Pyramid and Knowledge Distillation

Purified Distillation: Bridging Domain Shift and Category Gap in Incremental Object Detection

DIODE: Dilatable Incremental Object Detection

Deformable DETR: Deformable Transformers for End-to-End Object Detection

AugDETR: Improving Multi-scale Learning for Detection Transformer

Response-based Distillation for Incremental Object Detection

Incremental Learning of Object Detectors without Catastrophic Forgetting

Decoupled Mutual Distillation for Incremental Object Detection

UIFormer: A Unified Transformer-based Framework for Incremental Few-Shot Object Detection and Instance Segmentation

A New Knowledge Distillation for Incremental Object Detection

OD-DETR: Online Distillation for Stabilizing Training of Detection Transformer

Fully Transformer Detector with Multiscale Encoder and Dynamic Decoder

Incremental Object Detection with Image-level Labels