Knowledge Distillation Via Channel Correlation Structure

Bo Li,Bin Chen,Yunxiao Wang,Tao Dai,Maowei Hu,Yong Jiang,Shutao Xia
DOI: https://doi.org/10.1007/978-3-030-82136-4_29
2021-01-01
Abstract:Knowledge distillation (KD) has been one of the most popular techniques for model compression and acceleration, where a compact student model can be trained under the guidance of a large-capacity teacher model. The key of known KD methods is to explore multiple types of knowledge to direct the training of the student to mimic the teacher's behaviour. To this end, we aims at the knowledge exploration on channel correlation structure in terms of intra-instance and inter-instance relationship among a mini-batch, that can be extracted and transferred from the teacher's various outputs. Specifically, we propose a novel KD loss that derived from the Channel Correlation Structure (CCS) including feature-based and relation-based knowledge. With this novel KD loss, we can align the channel correlation of both feature maps between the teacher and student model by their channel correlation matrices. Extensive experimental results are performed to verify the effectiveness of our method compared with other KD methods on two benchmark datasets.
What problem does this paper attempt to address?