Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate

Songhua Liu,Jingwen Ye,Xinchao Wang
2023-04-20
Abstract:Style transfer aims to render the style of a given image for style reference to another given image for content reference, and has been widely adopted in artistic generation and image editing. Existing approaches either apply the holistic style of the style image in a global manner, or migrate local colors and textures of the style image to the content counterparts in a pre-defined way. In either case, only one result can be generated for a specific pair of content and style images, which therefore lacks flexibility and is hard to satisfy different users with different preferences. We propose here a novel strategy termed Any-to-Any Style Transfer to address this drawback, which enables users to interactively select styles of regions in the style image and apply them to the prescribed content regions. In this way, personalizable style transfer is achieved through human-computer interaction. At the heart of our approach lies in (1) a region segmentation module based on Segment Anything, which supports region selection with only some clicks or drawing on images and thus takes user inputs conveniently and flexibly; (2) and an attention fusion module, which converts inputs from users to controlling signals for the style transfer model. Experiments demonstrate the effectiveness for personalizable style transfer. Notably, our approach performs in a plug-and-play manner portable to any style transfer method and enhance the controllablity. Our code is available \href{<a class="link-external link-https" href="https://github.com/Huage001/Transfer-Any-Style" rel="external noopener nofollow">this https URL</a>}{here}.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of flexibility in existing style transfer methods. Specifically, existing style transfer methods can only generate a single result between a given content image and a style image, which makes it difficult to meet the personalized needs of different users. Users may want to apply different styles to different regions of the content image, but existing methods cannot provide this interactivity and controllability. To address this challenge, the author proposes a new method named **Any - to - Any Style Transfer**, aiming to enable users to flexibly select specific regions in the style image and apply them to the corresponding regions in the content image. In this way, users can customize the style transfer results according to their own preferences, thereby enhancing the personalized experience. ### Main problems and solutions 1. **Lack of flexibility**: Existing methods can only generate one result for a specific pair of content and style images and cannot meet the diverse needs of different users. 2. **Poor user interactivity**: Users cannot easily control which content regions should be applied with which style regions. ### Solutions The author proposes the following two key techniques to solve the above problems: 1. **Region segmentation module based on Segment Anything Model (SAM)**: - Using the SAM model, users can select specific regions in the content image and the style image by simple clicking or drawing. SAM can generate high - quality segmentation masks in real - time, thus achieving simple and efficient human - computer interaction. 2. **Attention fusion module**: - This module converts the region information selected by the user into control signals and combines it with the original style transfer model. Specifically, it will modify the default Attention Map so that the content regions selected by the user only focus on the style regions specified by the user, thereby achieving personalized style transfer. ### Method overview - **Encoding phase**: Use the pre - trained VGG - 19 network to extract the features of the content image and the style image. - **Interaction phase**: Obtain the masks of the content and style regions selected by the user through SAM. - **Fusion phase**: Fuse the control signals input by the user with the default Attention Map to generate a new Attention Map. - **Decoding phase**: Calculate the final stylized features according to the updated Attention Map and generate the final style transfer result through the decoder. ### Experimental results Experiments show that this method can not only achieve highly personalized style transfer but also be compatible with other existing style transfer methods, further enhancing its controllability. Users can customize the effect of style transfer through simple interactive operations, such as clicking, drawing bounding boxes or contours, to obtain more satisfactory results. In conclusion, this paper solves the problems of lack of flexibility and user interactivity in existing style transfer methods by introducing SAM and the attention fusion module, achieving a more personalized and controllable style transfer effect.