A-A KD: Attention and Activation Knowledge Distillation

Aorui Gou,Chao Liu,Heming Sun,Xiaoyang Zeng,Yibo Fan
DOI: https://doi.org/10.1109/BigMM52142.2021.00016
2021-01-01
Abstract:We propose a knowledge distillation method named attention and activation knowledge distillation (A-A KD) in this paper. By jointly taking advantage of the attention mechanism as an inter-channel method and activation information for intra-channel, the student model can overcome the insufficiency of feature extraction and effectively mimic features of the teacher model. A-A KD can outperform the s...
What problem does this paper attempt to address?