CogCartoon: Towards Practical Story Visualization

Zhongyang Zhu,Jie Tang
2023-12-17
Abstract:The state-of-the-art methods for story visualization demonstrate a significant demand for training data and storage, as well as limited flexibility in story presentation, thereby rendering them impractical for real-world applications. We introduce CogCartoon, a practical story visualization method based on pre-trained diffusion models. To alleviate dependence on data and storage, we propose an innovative strategy of character-plugin generation that can represent a specific character as a compact 316 KB plugin by using a few training samples. To facilitate enhanced flexibility, we employ a strategy of plugin-guided and layout-guided inference, enabling users to seamlessly incorporate new characters and custom layouts into the generated image results at their convenience. We have conducted comprehensive qualitative and quantitative studies, providing compelling evidence for the superiority of CogCartoon over existing methodologies. Moreover, CogCartoon demonstrates its power in tackling challenging tasks, including long story visualization and realistic style story visualization.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the two main drawbacks existing in the existing methods in the field of story visualization: 1. **Data and Storage Dependence**: The existing story visualization methods rely heavily on a large amount of training data and storage resources. For example, the commonly - used FlintstonesSV and PorotoSV datasets contain 20,132 and 10,191 training samples respectively. However, in the early stage of storybook creation, it is unrealistic to collect tens of thousands of samples. In addition, these methods need to store a separate model for each independent story, which is impractical in large - scale commercial scenarios because there are usually many independent stories. 2. **Lack of Flexibility**: The existing methods show limited flexibility in integrating new characters and controlling the layout. In practical applications, users often need to insert new characters and control the layout at any time. However, these methods are difficult to meet these requirements because they rely on data sets of specific characters for fine - tuning and lack layout control strategies. To solve the above problems, the paper proposes an innovative and practical story visualization framework - **CogCartoon**. CogCartoon overcomes the limitations of the existing methods through the following two strategies: - **Character Plug - in Generation**: By using a small number of training samples, a specific character can be represented as a compact plug - in of only 316 KB. In this way, the storage of multiple independent stories only needs to include various character plug - ins and a shared diffusion model, thus reducing the dependence on data and storage. - **Plug - in - Guided and Layout - Guided Inference**: Users can flexibly add new characters and modify character positions as needed. Specifically, when introducing a new character, the user can easily create the corresponding plug - in by providing a small number of samples, and then use the proposed inference method to generate a story illustration containing the new character by using the newly created character plug - in, the existing character plug - in and the custom layout at the same time. Through these innovative strategies, CogCartoon is not only more efficient in terms of data and storage, but also provides higher flexibility, making it more suitable for story visualization tasks in practical applications.