Abstract:Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to compact student models. However, current KD methods for super-resolution (SR) networks overlook the nature of SR task that the outputs of the teacher model are noisy approximations to the ground-truth distribution of high-quality images (GT), which shades the teacher model's knowledge to result in limited KD effects. To utilize the teacher model beyond the GT upper-bound, we present the Data Upcycling Knowledge Distillation (DUKD), to transfer the teacher model's knowledge to the student model through the upcycled in-domain data derived from training data. Besides, we impose label consistency regularization to KD for SR by the paired invertible augmentations to improve the student model's performance and robustness. Comprehensive experiments demonstrate that the DUKD method significantly outperforms previous arts on several SR tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the image super - resolution task, the existing knowledge distillation methods have limited effectiveness or even produce negative impacts. Specifically, when dealing with the image super - resolution task, traditional knowledge distillation methods find it difficult to effectively transfer "dark knowledge" because the output of the teacher model is a noisy approximation of the true distribution of high - resolution images, resulting in poor learning performance of the student model. Therefore, the paper proposes a new knowledge distillation framework - Data Upcycling Knowledge Distillation (DUKD), aiming to improve the performance of the student model by leveraging the information in the training data. ### Main Problems 1. **Limitations of Existing Knowledge Distillation Methods**: - The existing knowledge distillation methods have limited effectiveness in the image super - resolution task because the output of the teacher model is a noisy approximation of the true distribution of high - resolution images, which restricts the effective transfer of knowledge. - Directly aligning the model outputs may mislead the student model because the distribution information of the teacher model is masked by the true labels. 2. **Insufficiency of Data Augmentation**: - Conventional data augmentation methods merely reuse available image pairs, and the generated "recycled" data is not sufficient to distinguish the supervisory roles of the teacher model and the true labels. - Although the data - free knowledge distillation method avoids referring to the training data, it abandons the available training data, and the generated images may lead to more inaccurate outputs of the teacher model. ### Solutions The paper proposes the DUKD framework, which mainly consists of two modules: 1. **In - domain Data Upgrading**: - Construct auxiliary training samples through zoom - in and zoom - out operations, so that the teacher model generates corresponding high - resolution labels, thereby guiding the learning of the student model. - These auxiliary samples are closely related to the training set, preventing distribution shift and enabling the student model to learn more effectively from the responses of the teacher model. 2. **Label Consistency Regularization**: - Introduce label consistency regularization. Through selective reversible data augmentation techniques, the student model maintains the consistency of predictions when facing input perturbations. - This improves the robustness and generalization ability of the student model. ### Experimental Results - **Quantitative Comparison**: The experimental results show that the DUKD framework significantly outperforms the existing knowledge distillation methods in multiple super - resolution tasks. - **Visual Comparison**: By comparing the output images of the EDSR model trained by different methods, DUKD shows better performance in detail and texture reconstruction. ### Summary By analyzing the limitations of existing knowledge distillation methods in the image super - resolution task, the paper proposes a new framework, DUKD. Through in - domain data upgrading and label consistency regularization, it effectively improves the performance of the student model. This method is applicable not only to teacher - student model configurations with the same architecture but also to heterogeneous settings.

Data Upcycling Knowledge Distillation for Image Super-Resolution

DCCD: Reducing Neural Network Redundancy Via Distillation

MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution

Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

Learning knowledge representation with meta knowledge distillation for single image super-resolution

Fakd: feature-affinity based knowledge distillation for efficient image super-resolution

DSRKD: Joint Despecking and Super-Resolution of SAR Images Via Knowledge Distillation

Knowledge Distillation based Degradation Estimation for Blind Super-Resolution

An Embarrassingly Simple Approach for Knowledge Distillation

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition

Knowledge Augmentation for Distillation: A General and Effective Approach to Enhance Knowledge Distillation

Online Knowledge Distillation via Collaborative Learning

Role-Wise Data Augmentation for Knowledge Distillation

One Step Diffusion-based Super-Resolution with Time-Aware Distillation

DDistill-SR: Reparameterized Dynamic Distillation Network for Lightweight Image Super-Resolution

Efficient knowledge distillation for hybrid models: A vision transformer‐convolutional neural network to convolutional neural network approach for classifying remote sensing images

Revisiting Knowledge Distillation Via Label Smoothing Regularization

Improving Knowledge Distillation With a Customized Teacher

Improve Knowledge Distillation via Label Revision and Data Selection

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks