Diff-Font: Diffusion Model for Robust One-Shot Font Generation

Chen, Xinyuan,Wang, Chaoyue
DOI: https://doi.org/10.1007/s11263-024-02137-0
IF: 13.369
2024-06-15
International Journal of Computer Vision
Abstract:Font generation presents a significant challenge due to the intricate details needed, especially for languages with complex ideograms and numerous characters, such as Chinese and Korean. Although various few-shot (or even one-shot) font generation methods have been introduced, most of them rely on GAN-based image-to-image translation frameworks that still face (i) unstable training issues, (ii) limited fidelity in replicating font styles, and (iii) imprecise generation of complex characters. To tackle these problems, we propose a unified one-shot font generation framework called Diff-Font, based on the diffusion model. In particular, we approach font generation as a conditional generation task, where the content of characters is managed through predefined embedding tokens and the desired font style is extracted from a one-shot reference image. For glyph-rich characters such as Chinese and Korean, we incorporate additional inputs for strokes or components as fine-grained conditions. Owing to the proposed diffusion training process, these three types of information can be effectively modeled, resulting in stable training. Simultaneously, the integrity of character structures can be learned and preserved. To the best of our knowledge, Diff-Font is the first work to utilize a diffusion model for font generation tasks. Comprehensive experiments demonstrate that Diff-Font outperforms prior font generation methods in both high-fidelity font style replication and the generation of intricate characters. Our method achieves state-of-the-art results in both qualitative and quantitative aspects.
computer science, artificial intelligence
What problem does this paper attempt to address?