CoIn: A Lightweight and Effective Framework for Story Visualization and Continuation

Ming Tao,Bing-Kun Bao,Hao Tang,Yaowei Wang,Changsheng Xu
DOI: https://doi.org/10.1145/3664647.3680873
2024-01-01
Abstract:Story visualization aims to generate realistic and coherent images based on multi-sentence stories. However, current methods face challenges in achieving high-quality image generation while maintaining lightweight models and a fast generation speed. The main issue lies in the two existing frameworks. The independent framework prioritizes speed but sacrifices image quality with the non-collaborative image generation process and basic GAN-based learning. The autoregressive framework modifies the large pretrained text-to-image model in an auto-regressive manner with additional history modules, leading to large model size, resource-intensive requirements, and slow generation speed. To address these issues, we propose a lightweight and effective framework, namely CoIn. Specifically, we introduce a Context-aware Story Generator to predict shared context semantics for each image generator. Additionally, we propose an Intra-Story Interchange module that allows each image generator to exchange visual information with other image generators. Furthermore, we incorporate DINOv2 into the story and image discriminators to assess the story image quality more accurately. Extensive experiments show that our CoIn keeps the model size and generation speed of the independent framework, while achieving promising story image quality.
What problem does this paper attempt to address?