Sparse Friendly Distillation Using Feature Decoupling

WeiHong He,YuLi Fu,Youjun Xiang
DOI: https://doi.org/10.21203/rs.3.rs-3811616/v1
2024-01-01
Abstract:Abstract In our paper, we introduce the sparse-friendly distillation framework as an effective training strategy for knowledge distillation. While model sparsity techniques have been widely adopted to reduce training overhead, sparse student models often struggle to achieve good performance in knowledge distillation. To address this issue, our framework leverages the observation that sparse student models exhibit different behaviors in foreground and background features. We separate these features using different pooling techniques and apply separate mean squared error (MSE) feature distillation. Furthermore, we dynamically adjust the weights of the two loss components to optimize performance. Experimental results on CIFAR-10 and CIFAR-100 benchmarks demonstrate significant performance improvements, validating the effectiveness of our methodology. Additionally, we provide a comprehensive analysis of our experimental results, further validating the effectiveness of our approach.
What problem does this paper attempt to address?