Channel-Correlation-Based Selective Knowledge Distillation
Jianping Gou,Xiangshuo Xiong,Baosheng Yu,Yibing Zhan,Zhang Yi
DOI: https://doi.org/10.1109/tcds.2022.3232569
IF: 4.546
2022-01-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:As a simple yet effective model compression method, knowledge distillation (or KD) is used to learn a small lightweight student network by transferring valuable knowledge from a pretrained cumbersome teacher network. However, existing KD methods usually consider the feature knowledge either in different layers or individual samples, failing to explore more detailed information in different channels from the perspective of sample relationships. Meanwhile, the negative influences contained in the teacher knowledge are also not well investigated, especially, when using the response-based knowledge. To address the above-mentioned issues, we devise a novel KD approach entitled channel correlation-based selective KD (or CCSKD). Specifically, to distill rich knowledge from feature representations, we not only consider the feature knowledge from different channels for individual samples but also take into account the relational knowledge based on per-channel features for different samples. Furthermore, to further distill positive response-based knowledge, a selective strategy is developed, i.e., selective KD, to progressively correct the negative influences from the teacher knowledge during the distillation process. We perform extensive experiments on three image classification data sets, CIFAR-100, Stanford Cars, and Tiny-ImageNet, to demonstrate the effectiveness of the proposed CCSKD, which outperforms recent state-of-the-art methods with a clear margin. Our codes are publicly available at https://github.com/gjplab/CCSKD .