D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods

Onkar Susladkar,Gayatri Deshmukh,Sparsh Mittal,Parth Shastri
2024-08-07
Abstract:In image processing, one of the most challenging tasks is to render an image's semantic meaning using a variety of artistic approaches. Existing techniques for arbitrary style transfer (AST) frequently experience mode-collapse, over-stylization, or under-stylization due to a disparity between the style and content images. We propose a novel framework called D$^2$Styler (Discrete Diffusion Styler) that leverages the discrete representational capability of VQ-GANs and the advantages of discrete diffusion, including stable training and avoidance of mode collapse. Our method uses Adaptive Instance Normalization (AdaIN) features as a context guide for the reverse diffusion process. This makes it easy to move features from the style image to the content image without bias. The proposed method substantially enhances the visual quality of style-transferred images, allowing the combination of content and style in a visually appealing manner. We take style images from the WikiArt dataset and content images from the COCO dataset. Experimental results demonstrate that D$^2$Styler produces high-quality style-transferred images and outperforms twelve existing methods on nearly all the metrics. The qualitative results and ablation studies provide further insights into the efficacy of our technique. The code is available at <a class="link-external link-https" href="https://github.com/Onkarsus13/D2Styler" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in image style transfer (Style Transfer, ST), especially the problems of mode collapse, over - stylization and under - stylization in the Arbitrary Style Transfer (AST) task. These problems usually occur when there are large differences between the style image and the content image. Specifically, the paper proposes a new framework named D2Styler, which uses discrete diffusion methods and Vector - Quantized Generative Adversarial Networks (VQ - GANs) to improve the effect of arbitrary style transfer. The main contributions include: 1. **Combining discrete diffusion with AdaIN**: By combining the discrete diffusion method with Adaptive Instance Normalization (AdaIN), the problems of mode collapse and over - stylization common in existing style transfer methods are solved. This method not only stabilizes the training process but also ensures the integrity of the content when applying diverse artistic styles. 2. **Using AdaIN - feature - guided diffusion decoder**: The content and style feature statistics extracted by the AdaIN layer are used to conditionally guide the diffusion decoder, thereby more precisely controlling the style transfer process and ensuring that the output image can accurately reflect the required style attributes while maintaining the integrity of the content. 3. **Introducing a new loss function**: A new loss function \( L_{\text{feature}} \) is proposed to drive the output image to match the features of the AdaIN layer. This loss function is used in combination with the style loss and the content loss, enabling the model to generate realistic stylized images. 4. **Performance evaluation**: Through multiple benchmark tests, it is proved that D2Styler is superior to the existing 12 style transfer techniques in key indicators such as SSIM and LPIPS. In addition, D2Styler can achieve high - quality results within fewer diffusion steps, reducing the computational cost and improving the feasibility in real - time application scenarios. 5. **Multi - style application**: It is shown that D2Styler can effectively apply multiple styles to a single content image, significantly expanding the possibilities of artistic creation. 6. **Ablation study**: Through the ablation study, the contribution of each component to the performance of D2Styler is verified, providing clear evidence for the community and proving the effectiveness of this method. In general, through innovative technical means, this paper significantly improves the quality and stability of arbitrary style transfer, bringing new possibilities to the fields of image processing and artistic creation.