BFD: Binarized Frequency-enhanced Distillation for Vision Transformer

Hanglin Li,Peng Yin,Xiaosu Zhu,Lianli Gao,Jingkuan Song
DOI: https://doi.org/10.1109/icme57554.2024.10688360
2024-01-01
Abstract:Binarization demonstrates significant advantages in resource-limited devices, particularly for recent Vision Transformers (ViTs). To alleviate performance degradation during binarization, Knowledge Distillation (KD) is a crucial and beneficial technique. However, we identify that distilling ViTs in conventional methods overlooks high-frequency information, which loses fine-grained features and results in performance degradation. To address this challenge, we introduce a plug-and-play Binarized Frequency-enhanced Distillation (BFD) to preserve high-frequency information effectively. Specifically, High-Frequency Enhanced Distillation (HFED) is proposed to transmit attention maps from teacher to student in frequency domain, which enhances high-frequency information. Additionally, based on the finding that the frequency proportion varies across different layers, we further propose Progressive Frequency Partitioning (PFP) to distinguish frequency flexibly. Extensive experiments demonstrate the effectiveness of our BFD across a set of ViT variants. We demonstrate superior performance of 51.65% over ViT-B on TinyImageNet, and outperform SOTA by a substantial margin of 20.70% on ImageNet.
What problem does this paper attempt to address?