Complementary Relation Contrastive Distillation

Jinguo Zhu,Shixiang Tang,Dapeng Chen,Shijie Yu,Yakun Liu,Aijun Yang,Mingzhe Rong,Xiaohua Wang
DOI: https://doi.org/10.48550/arXiv.2103.16367
2021-03-29
Abstract:Knowledge distillation aims to transfer representation ability from a teacher model to a student model. Previous approaches focus on either individual representation distillation or inter-sample similarity preservation. While we argue that the inter-sample relation conveys abundant information and needs to be distilled in a more effective way. In this paper, we propose a novel knowledge distillation method, namely Complementary Relation Contrastive Distillation (CRCD), to transfer the structural knowledge from the teacher to the student. Specifically, we estimate the mutual relation in an anchor-based way and distill the anchor-student relation under the supervision of its corresponding anchor-teacher relation. To make it more robust, mutual relations are modeled by two complementary elements: the feature and its gradient. Furthermore, the low bound of mutual information between the anchor-teacher relation distribution and the anchor-student relation distribution is maximized via relation contrastive loss, which can distill both the sample representation and the inter-sample relations. Experiments on different benchmarks demonstrate the effectiveness of our proposed CRCD.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing knowledge distillation methods cannot effectively preserve the inter - relationships between samples when transferring knowledge from the teacher model to the student model. Specifically, the existing distillation methods mainly focus on the representation learning of a single sample or the similarity preservation between samples, but these methods fail to effectively capture and transfer the relationships between samples, especially for complex structural information. This results in the performance of the student model being not as expected in some tasks, especially in tasks that require an understanding of the relationships between samples, such as retrieval and classification. To solve this problem, the paper proposes a new knowledge distillation method - Complementary Relation Contrastive Distillation (CRCD). CRCD more effectively transfers the structural knowledge in the teacher model by defining a new cross - space relationship and using anchor samples to supervise this relationship in the student model. This method not only optimizes the sample representation but also preserves the relationships between samples, improving the learning effect of the student model. Specifically, the main contributions of CRCD include: 1. **Defining a new anchor - based cross - space relationship**: This method can optimize both the sample representation and the relationships between samples simultaneously. 2. **Using two complementary elements, features and their gradients, to model the representation relationship**: These two elements respectively capture the structural information in the feature space and the optimization dynamics. 3. **Maximizing the lower bound of the mutual information between the anchor - teacher relationship and the anchor - student relationship**: An efficient solution is achieved through the contrastive learning method. Through these innovations, CRCD performs excellently in multiple benchmark tests and significantly improves the performance of the student model.