Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object

Junhao Chen,Peng Rong,Jingbo Sun,Chao Li,Xiang Li,Hongwu Lv

2023-11-29

Abstract:Image style transfer occupies an important place in both computer graphics and computer vision. However, most current methods require reference to stylized images and cannot individually stylize specific objects. To overcome this limitation, we propose the "Soulstyler" framework, which allows users to guide the stylization of specific objects in an image through simple textual descriptions. We introduce a large language model to parse the text and identify stylization goals and specific styles. Combined with a CLIP-based semantic visual embedding encoder, the model understands and matches text and image content. We also introduce a novel localized text-image block matching loss that ensures that style transfer is performed only on specified target objects, while non-target regions remain in their original style. Experimental results demonstrate that our model is able to accurately perform style transfer on target objects according to textual descriptions without affecting the style of background regions. Our code will be available at <a class="link-external link-https" href="https://github.com/yisuanwang/Soulstyler" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper proposes a new framework called Soulstyler, aimed at addressing specific issues in the field of image style transfer. Specifically, most existing image style transfer methods require a reference stylized image and find it difficult to independently stylize specific objects within an image. To overcome this limitation, the researchers developed the Soulstyler framework, which allows users to guide the stylization process of specific objects in an image through simple text descriptions. The main features of Soulstyler include: 1. **Combining large language models with visual encoders**: Soulstyler utilizes large language models (such as GPT-4 and LLAMA-2) to parse text input, identify stylization targets, and specific styles. At the same time, it employs a CLIP-based semantic visual embedding encoder to understand and match the content of text and images. 2. **Localized text-image patch matching loss function**: This innovation ensures that style transfer is executed only on the specified target objects, while non-target areas retain their original style. 3. **Experimental results**: The paper demonstrates that Soulstyler can accurately perform style transfer on specific objects based on text descriptions without affecting the style of the background areas. This proves the model's effectiveness and flexibility in practical applications. In summary, Soulstyler brings significant advancements to the field of image style transfer, particularly in the precise control of stylized objects, providing new tools and technical support for digital art creation, personalized content generation, and other fields.

Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object

UATST: Towards Unpaired Arbitrary Text-Guided Style Transfer with Cross-Space Modulation

Artistic Style Transfer with Internal-external Learning and Contrastive Learning

Learning Structure-Aware Transformations for Arbitrary Image Style Transfer

ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

TeSTNeRF: Text-Driven 3D Style Transfer Via Cross-Modal Learning.

DualAST: Dual Style-Learning Networks for Artistic Style Transfer

GLStyleNet: Exquisite Style Transfer Combining Global and Local Pyramid Features

Name Your Style: An Arbitrary Artist-aware Image Style Transfer

Language-Driven Image Style Transfer

Bridging Text and Image for Artist Style Transfer via Contrastive Learning

TextStyler: A CLIP-based approach to text-guided style transfer

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

DiffStyler: Diffusion-based Localized Image Style Transfer

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

ITstyler: Image-optimized Text-based Style Transfer

CLAST: Contrastive Learning for Arbitrary Style Transfer

Foreground and background separated image style transfer with a single text condition

CLIPstyler: Image Style Transfer with a Single Text Condition

Image Style Transfer Algorithm Based on Semantic Segmentation