Diversified Branch Fusion for Self-Knowledge Distillation

Zuxiang Long,Fuyan Ma,Bin Sun,Mingkui Tan,Shutao Li
DOI: https://doi.org/10.1016/j.inffus.2022.09.007
IF: 18.6
2023-01-01
Information Fusion
Abstract:Knowledge distillation improves the performance of a compact student network by adding supervision from a pre-trained cumbersome teacher network during training. To avoid the resource consumption of acquiring an extra teacher network, the self-knowledge distillation designs a multi-branch network architecture with shared layers for teacher and student models, which are trained collaboratively in a one-stage manner. However, this method ignores the knowledge of shallow branches and rarely provides diverse knowledge for effective collaboration of different branches. To solve these two shortcomings, this paper proposes a novel Diversified Branch Fusion approach for Self-Knowledge Distillation (DBFSKD). Firstly, we design lightweight networks for adding to the middle layers of the backbone. They capture discriminative information by global–local attention. Then we introduce a diversity loss between different branches to explore diverse knowledge. Moreover, the diverse knowledge is further integrated to form two knowledge sources by a Selective Feature Fusion (SFF) and a Dynamic Logits Fusion (DLF). Thus, the significant knowledge of shallow branches is efficiently utilized and all branches learn from each other through the fused knowledge sources. Extensive experiments with various backbone structures on four public datasets (CIFAR100, Tiny-ImageNet200, ImageNet, and RAF-DB) show superior performance of the proposed method over other methods. More importantly, the DBFSKD achieves even better performance with fewer resource consumption than the baseline.
What problem does this paper attempt to address?