Knowledge Distillation via Multi-Teacher Feature Ensemble

Xin Ye,Rongxin Jiang,Xiang Tian,Rui Zhang,Yaowu Chen
DOI: https://doi.org/10.1109/lsp.2024.3359573
2024-02-10
IEEE Signal Processing Letters
Abstract:This letter proposes a novel method for effectively utilizing multiple teachers in feature-based knowledge distillation. Our method involves a multi-teacher feature ensemble module for generating a robust feature ensemble and a student-teacher mapping module for bridging the student feature and ensemble feature. In addition, we utilize separate optimization, where the student's feature extractor is optimized under distillation supervision while its classifier is obtained through classifier reconstruction. We evaluate our method on the CIFAR-100, ImageNet and MS-COCO datasets, and the experimental results demonstrate its effectiveness.
engineering, electrical & electronic
What problem does this paper attempt to address?