Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Junfei Yi,Jianxu Mao,Tengfei Liu,Mingjie Li,Hanyu Gu,Hui Zhang,Xiaojun Chang,Yaonan Wang

2024-06-11

Abstract:Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper exploration of latent knowledge. Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources. Extensive experiments validate the effectiveness of our proposed approach across various distillation strategies, detectors, and backbone architectures. Specifically, following our proposed paradigm, the existing FGD method achieves state-of-the-art (SoTA) performance, with ResNet50-based GFL achieving 44.1% mAP on the COCO dataset, surpassing the baselines by 3.9%.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address a key issue in Knowledge Distillation (KD) for object detection tasks: the uncertainty of the teacher model's knowledge. Specifically: 1. **Problems with existing methods**: Existing feature-level knowledge distillation methods often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training processes. Ignoring these uncertainties may lead to the student model relying on inaccurate guidance from the teacher model, thereby limiting the student model's learning ability. 2. **Proposed method**: To tackle this challenge, the authors propose a new feature-level distillation paradigm—"Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)." This method quantifies the uncertainty of the teacher model's knowledge by introducing Monte Carlo dropout techniques and incorporates it into the student model's training process, thereby reducing the risk of misleading due to over-reliance on the teacher model's knowledge. 3. **Experimental validation**: The authors conducted extensive experiments on the MS COCO dataset to validate the effectiveness of the proposed method. The experimental results show that the UET paradigm significantly improves the performance of the student model under various distillation strategies, detectors, and backbone architectures. For example, the GFL detector based on ResNet50 achieved an mAP of 44.1% on the COCO dataset, which is an improvement of 3.9% over the baseline method. In summary, the main contribution of this paper is the proposal of a new paradigm that considers the uncertainty of the teacher model's knowledge, effectively enhancing the effectiveness of feature-level knowledge distillation.

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Research on Knowledge Distillation Algorithm of Object Detection

Structured Knowledge Distillation for Accurate and Efficient Object Detection

Focal and Global Knowledge Distillation for Detectors

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-Guided Feature Imitation

Prediction-Guided Distillation for Dense Object Detection

Knowledge Probabilization in Ensemble Distillation: Improving Accuracy and Uncertainty Quantification for Object Detectors

Shared Knowledge Distillation Network for Object Detection

Gradient-Guided Knowledge Distillation for Object Detectors

UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors

Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors.

DETRDistill: A Universal Knowledge Distillation Framework for DETR-families.

Learning Efficient Detector with Semi-supervised Adaptive Distillation

Empowering Object Detection: Unleashing the Potential of Decoupled and Interactive Distillation

Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

DFD: Distillng the Feature Disparity Differently for Detectors

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

Channel-level Matching Knowledge Distillation for object detectors via MSE

CrossKD: Cross-Head Knowledge Distillation for Object Detection

Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition

Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup