CrossKD: Cross-Head Knowledge Distillation for Object Detection

Jiabao Wang,Yuming Chen,Zhaohui Zheng,Xiang Li,Ming-Ming Cheng,Qibin Hou

2024-04-15

Abstract:Knowledge Distillation (KD) has been validated as an effective model compression technique for learning compact object detectors. Existing state-of-the-art KD methods for object detection are mostly based on feature imitation. In this paper, we present a general and effective prediction mimicking distillation scheme, called CrossKD, which delivers the intermediate features of the student's detection head to the teacher's detection head. The resulting cross-head predictions are then forced to mimic the teacher's predictions. This manner relieves the student's head from receiving contradictory supervision signals from the annotations and the teacher's predictions, greatly improving the student's detection performance. Moreover, as mimicking the teacher's predictions is the target of KD, CrossKD offers more task-oriented information in contrast with feature imitation. On MS COCO, with only prediction mimicking losses applied, our CrossKD boosts the average precision of GFL ResNet-50 with 1x training schedule from 40.2 to 43.7, outperforming all existing KD methods. In addition, our method also works well when distilling detectors with heterogeneous backbones. Code is available at <a class="link-external link-https" href="https://github.com/jbwang1997/CrossKD" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the target conflict problem existing in the knowledge distillation (KD) process in object detection. Specifically, when the traditional prediction mimicking method transfers knowledge between the student model and the teacher model, it will encounter learning target conflicts caused by the inconsistency between the target assigner of the student model and the teacher model. This conflict makes the student model generate contradictions when receiving supervision signals from the ground - truth targets and the teacher's predictions, which affects the optimization process and final performance of the model. To solve this problem, the paper proposes a new Cross - Head Knowledge Distillation (CrossKD) method. CrossKD generates cross - head predictions by passing the features of the intermediate layer of the student model to the detection head of the teacher model, and then forces these cross - head predictions to mimic the predictions of the teacher model. This method not only alleviates the target conflict problem and improves the effectiveness of prediction mimicking, but also can provide more task - oriented information, thus achieving better performance improvement than existing methods in object detection tasks. For example, on the MS COCO dataset, using only the prediction mimicking loss, CrossKD increases the Average Precision (AP) of the GFL ResNet - 50 model from 40.2 to 43.7, surpassing all existing KD methods. In addition, experiments also show that CrossKD can be orthogonally combined with the feature imitation method to further improve the model performance.

CrossKD: Cross-Head Knowledge Distillation for Object Detection

CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection

Research on Knowledge Distillation Algorithm of Object Detection

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-Guided Feature Imitation

Distilling object detectors with efficient logit mimicking and mask-guided feature imitation

Gradient-Guided Knowledge Distillation for Object Detectors

Prediction-Guided Distillation for Dense Object Detection

G-detkd: Towards general distillation framework for object detectors via contrastive and semantic-guided feature imitation

Structured Knowledge Distillation for Accurate and Efficient Object Detection

UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors

Structural Knowledge Distillation for Object Detection

Focal and Global Knowledge Distillation for Detectors

Learning Efficient Detector with Semi-supervised Adaptive Distillation

Dual Relation Knowledge Distillation for Object Detection

Towards Efficient 3D Object Detection with Knowledge Distillation

Cosine similarity-guided knowledge distillation for robust object detectors

Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors.

Shared Knowledge Distillation Network for Object Detection

InstKD: Towards Lightweight 3D Object Detection With Instance-Aware Knowledge Distillation

Distilling Object Detectors With Fine-Grained Feature Imitation

Adaptive Cross-Architecture Mutual Knowledge Distillation