Abstract:In this paper, we propose Calliffusion, a system for generating high-quality Chinese calligraphy using diffusion models. Our model architecture is based on DDPM (Denoising Diffusion Probabilistic Models), and it is capable of generating common characters in five different scripts and mimicking the styles of famous calligraphers. Experiments demonstrate that our model can generate calligraphy that is difficult to distinguish from real artworks and that our controls for characters, scripts, and styles are effective. Moreover, we demonstrate one-shot transfer learning, using LoRA (Low-Rank Adaptation) to transfer Chinese calligraphy art styles to unseen characters and even out-of-domain symbols such as English letters and digits.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to generate high - quality Chinese calligraphy works and be able to effectively control the generated characters, fonts, and calligrapher styles. Specifically, the authors proposed a system named Calliffusion, which uses Diffusion Modeling to generate high - quality Chinese calligraphy and simultaneously achieves the following goals:
1. **Generate high - quality calligraphy works**: Existing generation models face challenges in generating high - quality Chinese calligraphy, especially in maintaining the consistency of characters, fonts, and styles. Calliffusion solves this problem by using DDPM (Denoising Diffusion Probabilistic Models).
2. **Effective control mechanism**: Users can specify the characters, fonts, and calligrapher styles to generate calligraphy works in specific styles. This is very useful for learning and imitating the works of famous calligraphers.
3. **Style transfer**: Through LoRA (Low - Rank Adaptation) technology, one - shot fine - tuning is achieved, enabling the model to transfer existing calligraphy styles to unseen characters or even non - Chinese characters (such as English letters and numbers).
### Main contributions
- **First application of diffusion models to generate high - quality Chinese calligraphy**: This is the first research to use diffusion models to generate Chinese calligraphy.
- **Controllable generation**: The model can generate corresponding calligraphy works according to the specified characters, fonts, and styles.
- **Style transfer technology**: Through one - shot fine - tuning, the model can apply existing fonts and writing styles to unseen characters and symbols.
### Method overview
- **Diffusion model architecture**: Based on the U - Net structure and DDPM sampling method, calligraphy images are generated through the forward process (adding noise) and the reverse process (denoising).
- **External condition control**: The generated content is controlled by inputting text descriptions (such as "Ren (the character 'person') in Lishu (clerical script) from Cao Quan Stele"). These descriptions are encoded by a pre - trained Chinese BERT model and then combined with the image.
- **Style transfer**: LoRA technology is used for one - shot fine - tuning to achieve style transfer for new characters or symbols.
### Experimental results
- **Objective evaluation**: A pre - trained classifier is used to classify the generated calligraphy images, and the results show that the generated works have high accuracy.
- **Subjective evaluation**: Through questionnaires, participants are asked to distinguish between the generated works and real works, and the results show that it is difficult for humans to distinguish between them.
- **Limitations**: Some generated results may have problems of missing or redundant strokes, but these problems can be reduced by increasing the amount of training data and the number of training epochs.
### Conclusion
The Calliffusion system shows its potential in generating high - quality Chinese calligraphy and can effectively control the generated content and style. Future work will focus on exploring few - shot style transfer to adapt to more novel fonts and styles.