Post-distillation Via Neural Resuscitation

Zhiqiang Bao,Zihao Chen,Chang-Dong Wang,Wei-Shi Zheng,Zhenhua Huang,Yunwen Chen
DOI: https://doi.org/10.1109/tmm.2023.3306601
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Knowledge distillation, a widely adopted model compression technique, distils knowledge from a large teacher model to a smaller student model, with the goal of reducing the computational resources required for the student model. However, most existing distillation approaches focus on the types of knowledge and how to distil them, which neglect the student model's neuronal responses to the knowledge. In this paper, we demonstrate that the kullback-leibler loss inhibits the neuronal responses in the opposite gradient direction, which injures the student model's potential during distilling. To address this problem, we introduce a principled dual-stage distillation scheme to rejuvenate all inhibited neurons at the neuronal level. In the first stage, we detect all the neurons in the student model during the standard distillation period and divide them into two parts according to their responses. In the second stage, we propose three strategies to resuscitate the neurons differently, which allows us to exploit the full potential of the student model. Through the experiments in various aspects of knowledge distillation, it is verified that the proposed approach outperforms the current state-of-the-art approaches. Our work provides a neuronal perspective for studying the response of the student model to the knowledge from the teacher model.
What problem does this paper attempt to address?