Abstract:Image style transfer aims to imbue digital imagery with the distinctive attributes of style targets, such as colors, brushstrokes, shapes, whilst concurrently preserving the semantic integrity of the content. Despite the advancements in arbitrary style transfer methods, a prevalent challenge remains the delicate equilibrium between content semantics and style attributes. Recent developments in large-scale text-to-image diffusion models have heralded unprecedented synthesis capabilities, albeit at the expense of relying on extensive and often imprecise textual descriptions to delineate artistic styles. Addressing these limitations, this paper introduces DiffStyler, a novel approach that facilitates efficient and precise arbitrary image style transfer. DiffStyler lies the utilization of a text-to-image Stable Diffusion model-based LoRA to encapsulate the essence of style targets. This approach, coupled with strategic cross-LoRA feature and attention injection, guides the style transfer process. The foundation of our methodology is rooted in the observation that LoRA maintains the spatial feature consistency of UNet, a discovery that further inspired the development of a mask-wise style transfer technique. This technique employs masks extracted through a pre-trained FastSAM model, utilizing mask prompts to facilitate feature fusion during the denoising process, thereby enabling localized style transfer that preserves the original image's unaffected regions. Moreover, our approach accommodates multiple style targets through the use of corresponding masks. Through extensive experimentation, we demonstrate that DiffStyler surpasses previous methods in achieving a more harmonious balance between content preservation and style integration.

What problem does this paper attempt to address?

The paper proposes a solution to one challenge in image style transfer, which is how to accurately transfer the style from one image to another while preserving the content semantics. Current methods often struggle to balance the content and style attributes. The paper introduces DiffStyler, a local image style transfer method based on the diffusion model. DiffStyler utilizes the stable diffusion model from text to image and Low-Rank Adaptation (LoRA) to capture the essence of style images, and guides the style transfer process through feature and attention injection strategies. This approach allows for local style transfer, can handle multiple style targets, and merges features using masks extracted by the pre-trained FastSAM model. Experiments show that DiffStyler achieves a better balance between content preservation and style integration, surpassing existing style transfer techniques.

DiffStyler: Diffusion-based Localized Image Style Transfer

Diverse Image Style Transfer Via Invertible Cross-Space Mapping

Learning Structure-Aware Transformations for Arbitrary Image Style Transfer

Optimal Transport of Deep Feature for Image Style Transfer

Style Permutation for Diversified Arbitrary Style Transfer

Artistic Style Transfer with Internal-external Learning and Contrastive Learning

UATST: Towards Unpaired Arbitrary Text-Guided Style Transfer with Cross-Space Modulation

Diversified Patch-based Style Transfer with Shifted Style Normalization

Correlation-based and Content-Enhanced Network for Video Style Transfer

ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

Image Neural Style Transfer with Preserving the Salient Regions.

Name Your Style: An Arbitrary Artist-aware Image Style Transfer

Real-time Localized Photorealistic Video Style Transfer

Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate

Domain-Aware Universal Style Transfer

Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object

D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods

Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning