Knowledge Distillation via Token-level Relationship Graph

Shuoxi Zhang,Hanpeng Liu,Kun He
2023-06-20
Abstract:Knowledge distillation is a powerful technique for transferring knowledge from a pre-trained teacher model to a student model. However, the true potential of knowledge transfer has not been fully explored. Existing approaches primarily focus on distilling individual information or instance-level relationships, overlooking the valuable information embedded in token-level relationships, which may be particularly affected by the long-tail effects. To address the above limitations, we propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) that leverages the token-wise relational knowledge to enhance the performance of knowledge distillation. By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model, resulting in improved distillation results. To further enhance the learning process, we introduce a token-wise contextual loss called contextual loss, which encourages the student model to capture the inner-instance semantic contextual of the teacher model. We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches. Empirical results demonstrate the superiority of TRG across various visual classification tasks, including those involving imbalanced data. Our method consistently outperforms the existing baselines, establishing a new state-of-the-art performance in the field of knowledge distillation.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to more effectively transfer information from the teacher model to the student model during the knowledge distillation process. Existing knowledge distillation methods mainly focus on instance-level relationships, while valuable information embedded in token-level relationship graphs is often overlooked. This can lead to information loss, especially when dealing with imbalanced datasets, where the long-tail effect further exacerbates this issue. To address the above limitations, the authors propose a new method called Token Relationship Graph-based Knowledge Distillation (TRG), which leverages token-level relationship graphs to enhance the effectiveness of knowledge distillation. By using TRG, the student model can effectively mimic the high-level semantic information in the teacher model, thereby improving the distillation results. Additionally, to further enhance the learning process, the authors introduce a method called context loss, which encourages the student model to capture the semantic context of instances within the teacher model. Experimental results show that the TRG method performs excellently in various visual classification tasks, especially when dealing with imbalanced datasets, achieving better performance compared to existing baseline methods.