Adversarial Diffusion Compression for Real-World Image Super-Resolution

Bin Chen,Gehui Li,Rongyuan Wu,Xindong Zhang,Jie Chen,Jian Zhang,Lei Zhang
2024-11-20
Abstract:Real-world image super-resolution (Real-ISR) aims to reconstruct high-resolution images from low-resolution inputs degraded by complex, unknown processes. While many Stable Diffusion (SD)-based Real-ISR methods have achieved remarkable success, their slow, multi-step inference hinders practical deployment. Recent SD-based one-step networks like OSEDiff and S3Diff alleviate this issue but still incur high computational costs due to their reliance on large pretrained SD models. This paper proposes a novel Real-ISR method, AdcSR, by distilling the one-step diffusion network OSEDiff into a streamlined diffusion-GAN model under our Adversarial Diffusion Compression (ADC) framework. We meticulously examine the modules of OSEDiff, categorizing them into two types: (1) Removable (VAE encoder, prompt extractor, text encoder, etc.) and (2) Prunable (denoising UNet and VAE decoder). Since direct removal and pruning can degrade the model's generation capability, we pretrain our pruned VAE decoder to restore its ability to decode images and employ adversarial distillation to compensate for performance loss. This ADC-based diffusion-GAN hybrid design effectively reduces complexity by 73% in inference time, 78% in computation, and 74% in parameters, while preserving the model's generation capability. Experiments manifest that our proposed AdcSR achieves competitive recovery quality on both synthetic and real-world datasets, offering up to 9.3$\times$ speedup over previous one-step diffusion-based methods. Code and models will be made available.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The problems the paper attempts to solve This paper aims to solve two main problems in the Real - Image Super - Resolution (Real - ISR) task in practical scenarios: 1. **High computational cost**: Existing Real - ISR methods based on diffusion models, such as OSEDiff and S3Diff, although perform well in generating high - quality images, their complex multi - step inference processes lead to high computational costs, which limit the deployment of these methods in practical applications, especially on resource - limited edge devices. 2. **Low inference speed**: Although some methods (such as OSEDiff) accelerate the inference process through first - order diffusion sampling, they still rely on large - scale pre - trained diffusion models, which makes the inference time long and cannot meet the requirements of real - time applications. ### The method proposed in the paper To solve the above problems, this paper proposes a new method named AdcSR, which compresses and optimizes the existing OSEDiff model based on the Adversarial Diffusion Compression (ADC) framework. Specifically, AdcSR achieves efficient and high - quality Real - ISR through the following steps: 1. **Module removal**: - **Remove the VAE encoder**: Directly use the PixelUnshuffle operation to process the low - resolution (LR) input image to avoid information loss. - **Remove the text and time modules**: Remove the prompt extractor, text encoder, cross - attention (CA) layer and time embedding layer because these modules contribute less in the Real - ISR task. 2. **Module pruning**: - **Optimize the UNet - VAE decoder connection**: Remove the output layer of UNet and the input layer of the VAE decoder, and introduce a convolutional layer to directly connect high - dimensional features to improve the efficiency of information transfer. - **Prune feature channels**: Retain 75% of the feature channels in UNet and 50% of the feature channels in the VAE decoder to reduce the model complexity while maintaining the network depth to ensure performance. 3. **Two - stage training scheme**: - **First stage: Pre - train the pruned VAE decoder**: Freeze the parameters of the pre - trained SD VAE encoder and only train the pruned VAE decoder, using reconstruction loss and adversarial loss for training. - **Second stage: Knowledge distillation and adversarial loss**: Through knowledge distillation in the feature space, transfer the knowledge of the pre - trained OSEDiff model (teacher model) to the AdcSR model (student model), and introduce adversarial loss to further improve the quality of the generated images. ### Experimental results The experimental results show that AdcSR significantly improves the inference speed and computational efficiency while maintaining high - quality image restoration. Specifically, it is manifested in the following aspects: - **Inference time**: The inference time of AdcSR is only 0.03 seconds, which is 9.3 times faster than other methods. - **Computational cost**: The amount of computation (MACs) of AdcSR is only 496G, much lower than other methods. - **Number of parameters**: The number of parameters of AdcSR is 456M, a 74% reduction. - **Visual quality**: The experimental results on synthetic datasets and real - world datasets show that AdcSR is competitive in restoring details and overall quality. ### Summary This paper successfully solves the bottlenecks of existing Real - ISR methods in terms of computational cost and inference speed by proposing the AdcSR model, providing an efficient and high - quality solution for practical applications.