Abstract:Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user <a class="link-external link-http" href="http://instructions.Although" rel="external noopener nofollow">this http URL</a> the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate text coherently within images, particularly for complex glyph structures like Chinese characters. To address this problem, we introduce GlyphDraw, a general learning framework aiming to endow image generation models with the capacity to generate images coherently embedded with text for any specific <a class="link-external link-http" href="http://language.We" rel="external noopener nofollow">this http URL</a> first sophisticatedly design the image-text dataset's construction strategy, then build our model specifically on a diffusion-based image generator and carefully modify the network structure to allow the model to learn drawing language characters with the help of glyph and position <a class="link-external link-http" href="http://information.Furthermore" rel="external noopener nofollow">this http URL</a>, we maintain the model's open-domain image synthesis capability by preventing catastrophic forgetting by using parameter-efficient fine-tuning <a class="link-external link-http" href="http://techniques.Extensive" rel="external noopener nofollow">this http URL</a> qualitative and quantitative experiments demonstrate that our method not only produces accurate language characters as in prompts, but also seamlessly blends the generated text into the <a class="link-external link-http" href="http://background.Please" rel="external noopener nofollow">this http URL</a> refer to our \href{<a class="link-external link-https" href="https://1073521013.github.io/glyph-draw.github.io/" rel="external noopener nofollow">this https URL</a>}{project page}. \end{abstract}

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in text - to - image generation, current models are insufficient in their ability to generate complex character structures (such as Chinese characters). Although existing image - generation models can generate high - quality and diverse images according to user instructions, they perform poorly in generating coherent text in images, especially for complex glyph structures such as Chinese characters. The paper introduces a general learning framework named GlyphDraw, which aims to endow image - generation models with the ability to generate coherent embedded text images in any specific language. Specifically, the paper points out that although some methods can render English text by using pre - trained language models (such as T5 - XXL), their generation ability for non - Latin characters (such as Chinese) is still limited. This is mainly because Chinese characters have a more complex two - dimensional spatial structure, consisting of eight different types of strokes, and the number of commonly used characters is huge, reaching thousands. Therefore, it is more difficult to generate accurate and diverse Chinese characters, and this remains an unsolved research problem. In addition, the method of freezing pre - trained language models has poor flexibility and is difficult to adapt to user - specified downstream languages, while training specific language models from scratch is costly and requires a large amount of data. Therefore, the author designs a general and flexible algorithm to solve the visual - text - rendering challenge through a lightweight training strategy and data set. To address this problem, the paper proposes the GlyphDraw framework, which uses character glyphs and text positions as auxiliary information to provide greater control over the character - generation process. This method can not only generate diverse visual texts that meet given instructions, but also intelligently match the most appropriate font style and seamlessly integrate it into the background, while maintaining high - generation quality and avoiding over - fitting and catastrophic - forgetting problems. The paper verifies the effectiveness of its method through experiments, especially in Chinese and English character rendering, achieving OCR accuracies of 74% and 75% which are significantly better than previous image - synthesis methods.

GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge.

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

GlyphDiffusion: Text Generation as Image Generation

GlyphControl: Glyph Conditional Control for Visual Text Generation

Decoupling Layout from Glyph in Online Chinese Handwriting Generation

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

Decoupled Representation Learning for Character Glyph Synthesis

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Learning to Draw Text in Natural Images with Conditional Adversarial Networks

Handwritten Chinese Font Generation with Collaborative Stroke Refinement.

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

Layout Agnostic Scene Text Image Synthesis with Diffusion Models

R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks

Intelligent Typography: Artistic Text Style Transfer for Complex Texture and Structure

Training-free Composite Scene Generation for Layout-to-Image Synthesis