CORSD: Class-Oriented Relational Self Distillation

Muzhou Yu,Sia Huat Tan,Kailu Wu,Runpei Dong,Linfeng Zhang,Kaisheng Ma
2023-04-29
Abstract:Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and intra- class relation modeling. Besides, the feature divergence of heterogeneous teacher-student architectures may lead to inaccurate relational knowledge transferring. In this work, we propose a novel training framework named Class-Oriented Relational Self Distillation (CORSD) to address the limitations. The trainable relation networks are designed to extract relation of structured data input, and they enable the whole model to better classify samples by transferring the relational knowledge from the deepest layer of the model to shallow layers. Besides, auxiliary classifiers are proposed to make relation networks capture class-oriented relation that benefits classification task. Experiments demonstrate that CORSD achieves remarkable improvements. Compared to baseline, 3.8%, 1.5% and 4.5% averaged accuracy boost can be observed on CIFAR100, ImageNet and CUB-200-2011, respectively.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve two main problems in knowledge distillation methods: 1. **Limitations of feature distillation methods**: Existing feature - based knowledge distillation methods only focus on the distillation of feature maps, but ignore the transfer of relationships between data samples, resulting in low distillation efficiency. 2. **Limitations of relationship distillation methods**: Although some works attempt to improve the performance of knowledge distillation by transferring the relationships between data samples, these methods usually rely on manually - designed relationship extraction functions (such as L2 - norm or inner product). These methods are weak in modeling inter - class contrast and intra - class similarity. In addition, the feature differences between teacher - student models of different architectures may lead to inaccuracies in relationship transfer in the feature space. To solve these problems, the author proposes a new training framework named **Class - Oriented Relational Self Distillation (CORSD)**. Specifically, the main contributions of CORSD include: 1. **Design of trainable relationship networks**: These relationship networks are used to extract inter - class and intra - class relationships of structured inputs and transfer these relationships from the deepest layer of the model to the shallow layer, thereby enhancing the classification ability of the model. 2. **Introduction of auxiliary classifiers**: These auxiliary classifiers help relationship networks capture class - oriented relationships beneficial to classification tasks, further improving the efficiency of relationship distillation. 3. **Extensive experimental verification**: Experimental results show that CORSD significantly outperforms existing knowledge distillation methods on multiple datasets and models. Through these improvements, CORSD can more effectively utilize the relationships between samples, thereby achieving better performance in classification tasks.