Data Upcycling Knowledge Distillation for Image Super-Resolution

Yun Zhang,Wei Li,Simiao Li,Hanting Chen,Zhijun Tu,Wenjia Wang,Bingyi Jing,Shaohui Lin,Jie Hu
2024-04-28
Abstract:Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to compact student models. However, current KD methods for super-resolution (SR) networks overlook the nature of SR task that the outputs of the teacher model are noisy approximations to the ground-truth distribution of high-quality images (GT), which shades the teacher model's knowledge to result in limited KD effects. To utilize the teacher model beyond the GT upper-bound, we present the Data Upcycling Knowledge Distillation (DUKD), to transfer the teacher model's knowledge to the student model through the upcycled in-domain data derived from training data. Besides, we impose label consistency regularization to KD for SR by the paired invertible augmentations to improve the student model's performance and robustness. Comprehensive experiments demonstrate that the DUKD method significantly outperforms previous arts on several SR tasks.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the image super - resolution task, the existing knowledge distillation methods have limited effectiveness or even produce negative impacts. Specifically, when dealing with the image super - resolution task, traditional knowledge distillation methods find it difficult to effectively transfer "dark knowledge" because the output of the teacher model is a noisy approximation of the true distribution of high - resolution images, resulting in poor learning performance of the student model. Therefore, the paper proposes a new knowledge distillation framework - Data Upcycling Knowledge Distillation (DUKD), aiming to improve the performance of the student model by leveraging the information in the training data. ### Main Problems 1. **Limitations of Existing Knowledge Distillation Methods**: - The existing knowledge distillation methods have limited effectiveness in the image super - resolution task because the output of the teacher model is a noisy approximation of the true distribution of high - resolution images, which restricts the effective transfer of knowledge. - Directly aligning the model outputs may mislead the student model because the distribution information of the teacher model is masked by the true labels. 2. **Insufficiency of Data Augmentation**: - Conventional data augmentation methods merely reuse available image pairs, and the generated "recycled" data is not sufficient to distinguish the supervisory roles of the teacher model and the true labels. - Although the data - free knowledge distillation method avoids referring to the training data, it abandons the available training data, and the generated images may lead to more inaccurate outputs of the teacher model. ### Solutions The paper proposes the DUKD framework, which mainly consists of two modules: 1. **In - domain Data Upgrading**: - Construct auxiliary training samples through zoom - in and zoom - out operations, so that the teacher model generates corresponding high - resolution labels, thereby guiding the learning of the student model. - These auxiliary samples are closely related to the training set, preventing distribution shift and enabling the student model to learn more effectively from the responses of the teacher model. 2. **Label Consistency Regularization**: - Introduce label consistency regularization. Through selective reversible data augmentation techniques, the student model maintains the consistency of predictions when facing input perturbations. - This improves the robustness and generalization ability of the student model. ### Experimental Results - **Quantitative Comparison**: The experimental results show that the DUKD framework significantly outperforms the existing knowledge distillation methods in multiple super - resolution tasks. - **Visual Comparison**: By comparing the output images of the EDSR model trained by different methods, DUKD shows better performance in detail and texture reconstruction. ### Summary By analyzing the limitations of existing knowledge distillation methods in the image super - resolution task, the paper proposes a new framework, DUKD. Through in - domain data upgrading and label consistency regularization, it effectively improves the performance of the student model. This method is applicable not only to teacher - student model configurations with the same architecture but also to heterogeneous settings.