Abstract:Style-guided text image generation tries to synthesize text image by imitating reference image's appearance while keeping text content unaltered. The text image appearance includes many aspects. In this paper, we focus on transferring style image's background and foreground color patterns to the content image to generate photo-realistic text image. To achieve this goal, we propose 1) a content-style cross attention based pixel sampling approach to roughly mimicking the style text image's background; 2) a pixel-wise style modulation technique to transfer varying color patterns of the style image to the content image spatial-adaptively; 3) a cross attention based multi-scale style fusion approach to solving text foreground misalignment issue between style and content images; 4) an image patch shuffling strategy to create style, content and ground truth image tuples for training. Experimental results on Chinese handwriting text image synthesis with SCUT-HCCDoc and CASIA-OLHWDB datasets demonstrate that the proposed method can improve the quality of synthetic text images and make them more photo-realistic.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that when generating text images with a specific style, how to effectively transfer the background and foreground color patterns of the reference image to the target image while keeping the text content unchanged, in order to generate realistic photo - level text images. Specifically, the paper focuses on transferring the background and foreground color patterns in the style image to the content image, thereby generating high - quality, realistic text images. The main challenges in this process include complex backgrounds, different lighting conditions, and foreground (text) alignment problems. To address these challenges, the paper makes the following several technical contributions: 1. **Pixel Sampling Module Based on Content - Style Cross - Attention (AttnPixamp)**: It is used to roughly imitate the background of the style text image. 2. **Pixel - level Style Modulation Technique (PixyMod)**: It is used to adaptively transfer the spatially varying color patterns of the style image to the content image. 3. **Multi - scale Style Fusion Module Based on Attention Mechanism (AttnMuSF)**: It is used to solve the text foreground misalignment problem between the style and content images. 4. **Image Patch Shuffling Strategy (Single Crop)**: It is used to create the style, content, and real image triplets required for training. Through these techniques, the paper aims to improve the quality of text image generation, especially in terms of line - level style transfer, making the synthesized text images more realistic. The experimental results show that the proposed method significantly improves the quality of the synthesized images in the Chinese handwritten text image synthesis task and makes them more realistic.

APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic Text Image Generation

Evaluate and Improve the Quality of Neural Style Transfer.

Diversified Patch-based Style Transfer with Shifted Style Normalization

Style Permutation for Diversified Arbitrary Style Transfer

UATST: Towards Unpaired Arbitrary Text-Guided Style Transfer with Cross-Space Modulation

PencilArt: A Chromatic Penciling Style Generation Framework.

Image Neural Style Transfer with Preserving the Salient Regions.

Intelligent Typography: Artistic Text Style Transfer for Complex Texture and Structure

Attention-Aware Multi-Stroke Style Transfer

TextStyler: A CLIP-based approach to text-guided style transfer

InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation

StyleAdapter: A Unified Stylized Image Generation Model

Multitask Attentive Network for Text Effects Quality Assessment.

Interactive Image Style Transfer Guided by Graffiti

A model integrating attention mechanism and generative adversarial network for image style transfer

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning

Scene Style Text Editing

StyleDrop: Text-to-Image Generation in Any Style