Abstract:Resource-constrained perception systems such as edge computing and vision-for-robotics require vision models to be both accurate and lightweight in computation and memory usage. While knowledge distillation is a proven strategy to enhance the performance of lightweight classification models, its application to structured outputs like object detection and instance segmentation remains a complicated task, due to the variability in outputs and complex internal network modules involved in the distillation process. In this paper, we propose a simple yet surprisingly effective sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt. Our progressive strategy can be easily combined with existing detection distillation mechanisms to consistently maximize student performance in various settings. To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students, and unprecedentedly boost the performance of ResNet-50 based RetinaNet from 36.5% to 42.0% AP and Mask R-CNN from 38.2% to 42.5% AP on the MS COCO benchmark.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to make visual models reduce the amount of computation and memory usage while maintaining high accuracy in resource - constrained perception systems, such as edge computing and robotic vision. Specifically, the paper focuses on how to effectively extract knowledge from complex teacher models through knowledge distillation techniques to train lightweight student models, especially in object detection and instance segmentation tasks. Due to the structured characteristics of the outputs of these tasks and the complexity of the internal network modules, traditional knowledge distillation methods are difficult to be directly applied. Therefore, the paper proposes a method of Multi - Teacher Progressive Distillation (MTPD), aiming to overcome the limitations of existing distillation methods in handling structured output tasks, especially when there are large architectural differences between teacher models and student models, such as knowledge transfer from Transformer - based teacher models to convolution - based student models. The main contributions of the paper include: 1. Proposing a framework for learning lightweight detectors through multi - teacher progressive distillation. This method is simple, effective and general, and can automatically design a sequence of teachers suitable for a given student and perform distillation step by step. 2. MTPD is a meta - level strategy that can easily combine previous detection distillation efforts. A comprehensive empirical evaluation has been carried out on the MS COCO dataset, and a consistent performance improvement can be observed regardless of the complexity of the distillation loss. 3. The lightweight RetinaNet and Mask R - CNN learned using MTPD have achieved state - of - the - art accuracy in various settings, especially in heterogeneous backbone and input resolution settings. In particular, the heterogeneous distillation from Transformer - based teacher detectors to convolution - based student models has been studied for the first time, and it has been found that progressive distillation is the key to bridging the gap between them. 4. Empirical analysis shows that the performance improvement mainly comes from better generalization ability rather than better optimization effect. The knowledge transferred from multiple teachers guides the student to reach a flatter minimum, thus helping the student to generalize better. Through the above methods, the paper solves the problem of how to efficiently train high - performance lightweight models in resource - constrained environments, providing strong support for edge devices and real - time applications.

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

Research on Knowledge Distillation Algorithm of Object Detection

Structured Knowledge Distillation for Accurate and Efficient Object Detection

Learning Efficient Detector with Semi-supervised Adaptive Distillation

Focal and Global Knowledge Distillation for Detectors

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-Guided Feature Imitation

Distilling Object Detectors With Fine-Grained Feature Imitation

Instance-Conditional Knowledge Distillation for Object Detection

Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors.

Distilling Object Detectors with Global Knowledge

Knowledge Distillation via Query Selection for Detection Transformer

Prediction-Guided Distillation for Dense Object Detection

Multi-level knowledge distillation for low-resolution object detection and facial expression recognition

Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Adaptive Knowledge Distillation for Lightweight Remote Sensing Object Detectors Optimizing

Shared Knowledge Distillation Network for Object Detection

Distilling Object Detectors via Decoupled Features

Learning Lightweight and Superior Detectors with Feature Distillation for Onboard Remote Sensing Object Detection

Task-Balanced Distillation for Object Detection

Distilling Image Classifiers in Object Detectors

Instance-Aware Distillation for Efficient Object Detection in Remote Sensing Images