MDR: Multi-stage Decoupled Relational Knowledge Distillation with Adaptive Stage Selection

Jiaqi Wang,Lu,Mingmin Chi,Jian Chen
DOI: https://doi.org/10.1145/3664647.3681520
2024-01-01
Abstract:The effectiveness of contrastive-learning-based Knowledge Distillation (KD) has sparked renewed interest in relational distillation, but these methods typically focus on angle-wise information from the penultimate layer. We show that exploiting relational information derived from intermediate layers further improves the effectiveness of distillation. We also find that adding distance-wise relational information to contrastive-learning-based methods negatively impacts distillation quality, revealing an implicit contention between angle-wise and distance-wise attributes. Therefore, we propose a Multi-stage Decoupled Relational (MDR) KD framework equipped with an adaptive stage selection to identify the stages that maximize the efficacy of transferring the relational knowledge. MDR framework decouples angle-wise and distance-wise information to resolve their conflicts while still preserving complete relational knowledge, thereby resulting in an elevated transferring efficiency and distillation quality. To evaluate the proposed method, we conduct extensive experiments on multiple image benchmarks i.e. CIFAR100, ImageNet and Pascal VOC, covering various tasks i.e. classification, few-shot learning, transfer learning and object detection. Our method exhibits superior performance under diverse scenarios, surpassing the state of the art by an average improvement of 1.22% on CIFAR-100 across extensively utilized teacher-student network pairs.
What problem does this paper attempt to address?