Distilling object detectors with efficient logit mimicking and mask-guided feature imitation
Xin Lu,Yichao Cao,Shikun Chen,Weixuan Li,Xin Zhou,Xiaobo Lu
DOI: https://doi.org/10.1016/j.eswa.2023.123079
IF: 8.5
2024-01-10
Expert Systems with Applications
Abstract:Knowledge distillation (KD) is a promising approach to learning compact models for object detection with information inherited from intricate teacher networks. In this paper, we raise some shortcomings of existing KD methods for object detectors, e.g., ignoring knowledge selection, coarse feature imitation mask, etc. To address these issues, a novel KD framework has been presented to train efficient object detectors via Logit Mimicking and Feature Imitation (LMFI). First, a novel logit mimicking method is put forward to distill classification and localization heads. On the one hand, it first proposes to mimic the classification logits of one-category object detectors. On the other hand, the localization knowledge from teacher predictions and ground truths are exploited, which dynamically guides the learning of student's regression outputs by stages. Second, an adaptive positive teacher selection (APTS) strategy is designed to obtain high-quality teacher samples during distillation, which reduces the transmission of inferior knowledge. Moreover, a soft metric and a fine-grained mask are heuristically introduced to reconcile the discrepancy between teacher and student features in a position-wise manner. Extensive experiments show that LMFI outperforms the state-of-the-art KD frameworks for object detection. It can significantly boost the performance of various detectors on different benchmarks, e.g., 2.83% and 2.58% MR−2 reduction of Cascade R-CNN on the S and All sets of CityPersons, and an improvement of ResNet-50 based TOOD from 40.3% to 42.9% mAP on the COCO benchmark.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science