AnimeDiff: Customized Image Generation of Anime Characters Using Diffusion Model

Yuqi Jiang,Qiankun Liu,Dongdong Chen,Lu Yuan,Ying Fu
DOI: https://doi.org/10.1109/tmm.2024.3415357
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Due to the unprecedented power of text-to-image diffusion models, customizing these models to generate new concepts has gained increasing attention. Existing works have achieved some success on real-world concepts, but fail on the concepts of anime characters. We empirically find that such low quality comes from the newly introduced identifier text tokens, which are optimized to identify different characters. In this paper, we propose AnimeDiff which focuses on customized image generation of anime characters. Our AnimeDiff directly binds anime characters with their names and keeps the embeddings of text tokens unchanged. Furthermore, when composing multiple characters in a single image, the model tends to confuse the properties of those characters. To address this issue, our AnimeDiff incorporates a Cut-and-Paste data augmentation strategy that produces multi-character images for training by cutting and pasting multiple characters onto background images. Experiments are conducted to prove the superiority of AnimeDiff over other methods.
What problem does this paper attempt to address?