Input-Dependent Dynamical Channel Association for Knowledge Distillation.

Qiankun Tang,Yuan Zhang,Xiaogang Xu,Jun Wang,Yimin Guo
DOI: https://doi.org/10.1109/icassp49357.2023.10095107
2023-01-01
ICASSP
Abstract:Feature-map based knowledge distillation has exhibited its significance in improving the performance of student model. Existing works mainly focus on the formulation of knowledge, but ignore the number difference of channels due to heterogeneous architectures of teacher-student pair. They generally adopt handcrafted matching or input-independent association matrix, which would lead to the semantic mismatch, thus suboptimal performance. To resolve this problem, we present an input-dependent channel association module. This module automatically generates an allocation matrix in a cross-attention manner, which enables each student channel to be dynamically connected to its semantic-related teacher channel based on its learning state. An alternative training scheme is applied for stable optimization. Extensive experiments on image classification with a variety of settings based on the popular network architectures well demonstrate the effectiveness of our proposed strategy.
What problem does this paper attempt to address?