B-AT-KD: Binary attention map knowledge distillation

Xing Wei,Yuqing Liu,Jiajia Li,Huiyong Chu,Zichen Zhang,Feng Tan,Pengwei Hu
DOI: https://doi.org/10.1016/j.neucom.2022.09.064
IF: 6
2022-01-01
Neurocomputing
Abstract:Convolutional neural networks (CNNs) have been extensively used in a number of applications and have shown to be quite effective. As the depth and width of the network expand, not only does prediction accuracy improve, but so does the network's training complexity. To solve this problem, the knowledge distillation (KD) network, which consists of a teacher and a student network, is established. Furthermore, the widely accessible attention mechanism is important in the KD. Therefore, we build the Binary Attention Map Knowledge Distillation (B-AT-KD) model to enhance the performance of the student network by combining the output of the teacher network and associated attention maps with distinct semantics to supervise the student network. In addition, to improve training results, we propose a new global loss function called KDM-Loss and use hyperparameters search to assign suitable weights. Finally, we compare our B-AT-KD to state-of-the-art KD methods on the CIFAR 10, CIFAR 100, and Mini-ImageNet datasets. Experiments reveal that our proposed approach improves CIFAR 10 precision by 3.98%, Mini-ImageNet accuracy by 2.30%, and CIFAR 100 precision by 2.51% while reducing the number of parameters and computations by more than 50%.
What problem does this paper attempt to address?