DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection

Zhourui Zhang,Jun Li,Zhijian Wu,Jifeng Shen,Jianhua Xu
2024-07-18
Abstract:In recent years, current mainstream feature masking distillation methods mainly function by reconstructing selectively masked regions of a student network from the feature maps of a teacher network. In these methods, attention mechanisms can help to identify spatially important regions and crucial object-aware channel clues, such that the reconstructed features are encoded with sufficient discriminative and representational power similar to teacher features. However, previous feature-masking distillation methods mainly address homogeneous knowledge distillation without fully taking into account the heterogeneous knowledge distillation scenario. In particular, the huge discrepancy between the teacher and the student frameworks within the heterogeneous distillation paradigm is detrimental to feature masking, leading to deteriorating reconstructed student features. In this study, a novel dual feature-masking heterogeneous distillation framework termed DFMSD is proposed for object detection. More specifically, a stage-wise adaptation learning module is incorporated into the dual feature-masking framework, and thus the student model can be progressively adapted to the teacher models for bridging the gap between heterogeneous networks. Furthermore, a masking enhancement strategy is combined with stage-wise learning such that object-aware masking regions are adaptively strengthened to improve feature-masking reconstruction. In addition, semantic alignment is performed at each Feature Pyramid Network (FPN) layer between the teacher and the student networks for generating consistent feature distributions. Our experiments for the object detection task demonstrate the promise of our approach, suggesting that DFMSD outperforms both the state-of-the-art heterogeneous and homogeneous distillation methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the heterogeneous knowledge distillation scenario, the existing feature - mask distillation methods have poor performance when dealing with teacher and student networks of different structures. Specifically, the paper points out that most of the current feature - mask distillation methods are mainly designed for homogeneous knowledge distillation, that is, it is assumed that the teacher and student models have similar structures, and only the teacher model usually adopts a stronger backbone network. However, in heterogeneous distillation, there are significant structural differences between the teacher and student networks, which leads to limited knowledge transfer effects directly from the teacher network to the student network, and it is difficult for the student network to effectively learn from the teacher network, thus affecting the distillation effect. For this reason, the paper proposes a dual - feature - masking stage - wise knowledge distillation (DFMSD) framework, which aims to improve knowledge transfer between heterogeneous networks and enhance the learning effect and performance of the student model through the stage - adaptive learning (SAL) module, mask enhancement strategies, and semantic alignment techniques. This framework can not only better handle knowledge transfer between heterogeneous networks, but also outperform existing homogeneous and heterogeneous distillation methods in object detection tasks.