Visual Story Generation Based on Emotion and Keywords

Yuetian Chen,Ruohua Li,Bowen Shi,Peiru Liu,Mei Si
DOI: https://doi.org/10.48550/arXiv.2301.02777
2023-01-07
Abstract:Automated visual story generation aims to produce stories with corresponding illustrations that exhibit coherence, progression, and adherence to characters' emotional development. This work proposes a story generation pipeline to co-create visual stories with the users. The pipeline allows the user to control events and emotions on the generated content. The pipeline includes two parts: narrative and image generation. For narrative generation, the system generates the next sentence using user-specified keywords and emotion labels. For image generation, diffusion models are used to create a visually appealing image corresponding to each generated sentence. Further, object recognition is applied to the generated images to allow objects in these images to be mentioned in future story development.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to automatically generate coherent, progressive visual stories that are in line with the emotional development of characters based on emotions and keywords. Specifically, the author proposes an interactive visual story - generation pipeline, which allows users to control the events and emotions in the generated content, and combines image - generation techniques to create corresponding visual images for each generated sentence. ### Specific description of the problem: 1. **Coherence and progressiveness**: Traditional automatic story - generation systems often have difficulty ensuring the coherence and progressiveness of stories when generating long - form stories. This paper aims to ensure that the generated stories are not only coherent but also have reasonable progress by introducing emotions and keywords as generation prompts. 2. **Emotional consistency**: The emotional development of characters in a story is an important part of the story's attractiveness. Existing methods have shortcomings in dealing with emotional changes, resulting in generated stories that may lack emotional depth or consistency. This paper improves emotional consistency by using Plutchik's emotion wheel model to predict and control the emotional labels of each sentence. 3. **User interaction**: Many automatic story - generation systems ignore user participation, resulting in generated content that may not meet users' expectations. The system proposed in this paper allows users to provide keywords and emotional labels during the generation process, making the generated story more in line with users' intentions and creativity. 4. **Visualization**: Stories that rely solely on text generation may not be vivid enough. This paper combines image - generation techniques (such as the Diffusion model) to create corresponding images for each generated sentence, enhancing the expressiveness and immersion of the story. ### Overview of the solution: - **Emotion prediction**: Use the BERT model to predict the emotions of existing stories and generate the emotional label of the next sentence. - **Keyword extraction**: Extract keywords from the generated images for the development of subsequent stories. - **Sentence generation**: Generate the next sentence of the story based on the T5 model and the provided keywords and emotional labels. - **Image generation**: Use models such as Disco Diffusion to generate images corresponding to the generated sentences, and extract keywords in the images through object detection models (such as YOLOv5 and Faster R - CNN). Through these methods, this paper proposes an innovative, interactive visual story - generation framework, which significantly improves the quality of story generation and user experience.