Abstract:Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based image-to-image one-shot font generation method, which innovatively models the font imitation task as a noise-to-denoise paradigm. In our method, we introduce a Multi-scale Content Aggregation (MCA) block, which effectively combines global and local content cues across different scales, leading to enhanced preservation of intricate strokes of complex characters. Moreover, to better manage the large variations in style transfer, we propose a Style Contrastive Refinement (SCR) module, which is a novel structure for style representation learning. It utilizes a style extractor to disentangle styles from images, subsequently supervising the diffusion model via a meticulously designed style contrastive loss. Extensive experiments demonstrate FontDiffuser's state-of-the-art performance in generating diverse characters and styles. It consistently excels on complex characters and large style changes compared to previous methods. The code is available at <a class="link-external link-https" href="https://github.com/yeungchenwa/FontDiffuser" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper aims to address two major issues in the field of automatic font generation: the generation of complex characters and the handling of large style variations. ### Research Background and Objectives - **Research Background**: Automatic font generation is a mimicking task that aims to create a font library capable of imitating the style of a reference image while preserving the content of the source image. Although existing font generation methods have achieved satisfactory performance, challenges remain in handling complex characters and large style variations. - **Specific Issues**: - Complex Character Generation: For characters with complex stroke structures, existing methods struggle to retain details completely, leading to issues such as missing strokes and incorrect layouts. - Large Style Variations: When there are significant style differences between the source image and the reference image, existing methods find it difficult to perform effective style transfer, often resulting in inconsistent styles. - **Objectives**: The paper proposes a new method called FontDiffuser, a one-shot image-to-image font generation method based on a diffusion model, aimed at effectively addressing the aforementioned issues. ### Method Overview - **Basic Framework**: FontDiffuser adopts a framework based on a conditional diffusion model, modeling the font generation task as a process from noise to denoising. - **Multi-Scale Content Aggregation (MCA) Block**: To better retain the details of complex characters, the paper introduces the MCA block, which can effectively combine global and local content cues at different scales. - **Style Contrast Refinement (SCR) Module**: To better manage large style variations, the SCR module is proposed. It guides the diffusion model through a novel style representation learning strategy, ensuring that the generated font style is consistent with the target style. ### Main Contributions - Proposes FontDiffuser, a new one-shot image-to-image font generation framework based on a diffusion model, achieving state-of-the-art performance in generating complex characters and handling large style variations. - Designs the Multi-Scale Content Aggregation (MCA) block to enhance the ability to retain fine strokes in complex characters. - Introduces the Style Contrast Refinement (SCR) module to supervise the diffusion model, enabling it to effectively handle large style variations. - FontDiffuser demonstrates superior performance in generating characters of varying complexity levels and shows the capability of cross-language generation, such as generating fonts from Chinese to Korean.

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

Chinese Character Font Generation Based on Diffusion Model

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Few-shot Font Generation based on SAE and Diffusion Model

Few-shot Font Generation by Learning Style Difference and Similarity

Few-shot Font Style Transfer with Multiple Style Encoders

DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation

CF-Font: Content Fusion for Few-shot Font Generation

Font Style Interpolation with Diffusion Models

TextDiffuser: Diffusion Models as Text Painters

GRIF-DM: Generation of Rich Impression Fonts using Diffusion Models

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

FET-GAN: Font and Effect Transfer via K-shot Adaptive Instance Normalization

DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

GlyphDiffusion: Text Generation as Image Generation

XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

Few shot font generation via transferring similarity guided global style and quantization local style

Calliffusion: Chinese Calligraphy Generation and Style Transfer with Diffusion Modeling

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers