FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang,Dezhi Peng,Yuxin Kong,Yuyi Zhang,Cong Yao,Lianwen Jin
2023-12-19
Abstract:Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based image-to-image one-shot font generation method, which innovatively models the font imitation task as a noise-to-denoise paradigm. In our method, we introduce a Multi-scale Content Aggregation (MCA) block, which effectively combines global and local content cues across different scales, leading to enhanced preservation of intricate strokes of complex characters. Moreover, to better manage the large variations in style transfer, we propose a Style Contrastive Refinement (SCR) module, which is a novel structure for style representation learning. It utilizes a style extractor to disentangle styles from images, subsequently supervising the diffusion model via a meticulously designed style contrastive loss. Extensive experiments demonstrate FontDiffuser's state-of-the-art performance in generating diverse characters and styles. It consistently excels on complex characters and large style changes compared to previous methods. The code is available at <a class="link-external link-https" href="https://github.com/yeungchenwa/FontDiffuser" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address two major issues in the field of automatic font generation: the generation of complex characters and the handling of large style variations. ### Research Background and Objectives - **Research Background**: Automatic font generation is a mimicking task that aims to create a font library capable of imitating the style of a reference image while preserving the content of the source image. Although existing font generation methods have achieved satisfactory performance, challenges remain in handling complex characters and large style variations. - **Specific Issues**: - Complex Character Generation: For characters with complex stroke structures, existing methods struggle to retain details completely, leading to issues such as missing strokes and incorrect layouts. - Large Style Variations: When there are significant style differences between the source image and the reference image, existing methods find it difficult to perform effective style transfer, often resulting in inconsistent styles. - **Objectives**: The paper proposes a new method called FontDiffuser, a one-shot image-to-image font generation method based on a diffusion model, aimed at effectively addressing the aforementioned issues. ### Method Overview - **Basic Framework**: FontDiffuser adopts a framework based on a conditional diffusion model, modeling the font generation task as a process from noise to denoising. - **Multi-Scale Content Aggregation (MCA) Block**: To better retain the details of complex characters, the paper introduces the MCA block, which can effectively combine global and local content cues at different scales. - **Style Contrast Refinement (SCR) Module**: To better manage large style variations, the SCR module is proposed. It guides the diffusion model through a novel style representation learning strategy, ensuring that the generated font style is consistent with the target style. ### Main Contributions - Proposes FontDiffuser, a new one-shot image-to-image font generation framework based on a diffusion model, achieving state-of-the-art performance in generating complex characters and handling large style variations. - Designs the Multi-Scale Content Aggregation (MCA) block to enhance the ability to retain fine strokes in complex characters. - Introduces the Style Contrast Refinement (SCR) module to supervise the diffusion model, enabling it to effectively handle large style variations. - FontDiffuser demonstrates superior performance in generating characters of varying complexity levels and shows the capability of cross-language generation, such as generating fonts from Chinese to Korean.