TEXTOC: Text-driven Object-Centric Style Transfer

Jihun Park,Jongmin Gim,Kyoungmin Lee,Seunghun Lee,Sunghoon Im
2024-08-22
Abstract:We present Text-driven Object-Centric Style Transfer (TEXTOC), a novel method that guides style transfer at an object-centric level using textual inputs. The core of TEXTOC is our Patch-wise Co-Directional (PCD) loss, meticulously designed for precise object-centric transformations that are closely aligned with the input text. This loss combines a patch directional loss for text-guided style direction and a patch distribution consistency loss for even CLIP embedding distribution across object regions. It ensures a seamless and harmonious style transfer across object regions. Key to our method are the Text-Matched Patch Selection (TMPS) and Pre-fixed Region Selection (PRS) modules for identifying object locations via text, eliminating the need for segmentation masks. Lastly, we introduce an Adaptive Background Preservation (ABP) loss to maintain the original style and structural essence of the image's background. This loss is applied to dynamically identified background areas. Extensive experiments underline the effectiveness of our approach in creating visually coherent and textually aligned style transfers.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of text - driven object - centric style transfer. Specifically, the author proposes a new method named **Text - driven Object - Centric Style Transfer (TEXTOC)** to achieve the following goals: 1. **Object - level style transfer**: Traditional style transfer methods usually rely on reference images to guide the style conversion process. However, these methods have limitations when dealing with complex scenes, especially when it is necessary to stylize specific objects. TEXTOC solves this problem by using text descriptions to directly guide style transfer. 2. **Maintaining the structural integrity of objects and backgrounds**: When applying style transfer, many existing methods may change the original content of objects or the structural information of backgrounds, resulting in inconsistent visual effects. TEXTOC introduces multiple loss functions and technical modules to ensure that while performing style transfer, the structural integrity of objects and the original style of backgrounds are preserved. 3. **No need for segmentation masks or reference images**: Existing object - centric style transfer methods usually rely on segmentation masks or reference images to identify and process specific objects. TEXTOC eliminates the need for these additional inputs and can achieve accurate object localization and style transfer only through text descriptions. ### Main contributions To achieve the above goals, the paper proposes the following key technologies and components: - **Patch - wise Co - Directional (PCD) Loss**: It is used to ensure that the direction of style transfer is consistent with the text description and to maintain the consistency of feature distribution within the object area. \[ L_{\text{pcd}}=\lambda_{\text{dir}}L_{\text{dir}}+\lambda_{\text{con}}L_{\text{con}} \] where, - \(L_{\text{dir}}\) is the patch - wise direction loss, which is defined as: \[ L_{\text{dir}}=\frac{1}{N}\sum_{i = 1}^{N}\left(1-\frac{\Delta P_i\cdot\Delta T}{|\Delta P_i||\Delta T|}\right) \] where, \[ \Delta P_i = E_I(\text{aug}(P_i^{\text{out}}))-E_I(\text{aug}(P_i^{\text{src}})),\quad\Delta T = E_T(T_{\text{tgt}})-E_T(T_{\text{src}}) \] - \(L_{\text{con}}\) is the patch distribution consistency loss, which is defined as: \[ L_{\text{con}}=\text{JSD}(D_{\text{src}}, D_{\text{out}}) \] where JSD represents Jensen - Shannon divergence. - **Adaptive Background Preservation (ABP) Loss**: It is used to ensure that the style and structure of the background area remain unchanged and prevent the background from being mis - transferred. \[ L_{\text{abp}}=L_{\text{MSSSIM}}(I_{\text{out}}\odot M_{\text{bg}}^*, I_{\text{src}}\odot M_{\text{bg}}^*)+L_{L1}(I_{\text{out}}\odot M_{\text{bg}}^*, I_{\text{src}}