Training-Free Diffusion Models for Content-Style Synthesis

Ruipeng Xu,Fei Shen,Xu Xie,Zongyi Li
DOI: https://doi.org/10.1007/978-981-97-5609-4_24
2024-01-01
Abstract:Recent advancements have highlighted the significant potential of controlled diffusion models in the field of personalized image generation. However, current methods for style migration require retraining or fine-tuning the diffusion model on the style dataset. These methods, although efficient, require significant resources and do not always guarantee optimal results with the newly adjusted weights. To address these challenges, we introduce a two-stage Progressive Style Transfer Diffusion Model (PSTDM), a two-stage approach that can perform the transfer task without requiring additional training. This method employs image segmentation label to divide the image into regions to be altered (change regions) and those to be preserved (hold regions). In the first stage, our style injection module generates the stylistic features of the change region, guided by stylized images, facilitating a coarse style injection. The second stage involves the semantic retention module, which generates the target transfer image by managing both the hold region features and the coarse style features. The two stages of PSTDM work in tandem to gradually produce a final composite image of high quality and fidelity. Extensive experiments have demonstrated that it can achieve both content and style synthesis while maintaining consistency with the details of the source image.
What problem does this paper attempt to address?