Kaiyuan Liu,Jiahao Mei,Hengyu Zhang,Yihuai Zhang,Xingjiao Wu,Daoguo Dong,Liang He
Abstract:Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we propose a new Chinese calligraphy generation model 'Moyun' , which replaces the Unet in the Diffusion model with Vision Mamba and introduces the TripleLabel control mechanism to achieve controllable calligraphy generation. The model was tested on our large-scale dataset 'Mobao' of over 1.9 million images, and the results demonstrate that 'Moyun' can effectively control the generation process and produce calligraphy in the specified style. Even for calligraphy the calligrapher has not written, 'Moyun' can generate calligraphy that matches the style of the calligrapher.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate calligraphy works of a specific style in Chinese calligraphy generation by specifying calligraphers, fonts, and character styles. Although the existing Chinese calligraphy generation techniques have achieved style transfer, there are still challenges in generating calligraphy works with specific calligraphers, fonts, and character styles. Specifically:
1. **Style Control**: Existing models have difficulty in precisely controlling the specific style of the generated calligraphy, especially when specifying calligraphers, fonts, and character styles.
2. **Structure Matching**: When existing models generate calligraphy, especially in stroke structure and brushstroke details, there is a gap compared with real calligraphy works.
3. **Dataset Scale**: The scale of existing datasets is small, and the annotations are not detailed, which limits the learning ability of the models.
To address these challenges, the author proposes a new Chinese calligraphy generation model "Moyun", and its main innovations include:
- **Introducing Vision Mamba**: Replace Unet in the diffusion model and use Vision Mamba to process images to better capture the structural relationships between strokes.
- **TripleLabel Control Mechanism**: A multi - label control mechanism is designed. By combining the labels of calligraphers, fonts, and characters to control the generation process, controllable calligraphy generation is achieved.
- **Large - scale Dataset**: A large - scale dataset "Mobao" containing more than 1.9 million high - resolution binarized images is constructed, enriching the learning resources of the model.
Through these improvements, the "Moyun" model can more accurately control the style when generating calligraphy and generate works that are highly similar to real calligraphy works, and can even generate calligraphy in line with the style of calligraphers for characters that they have not written.