Abstract:Automated visual story generation aims to produce stories with corresponding illustrations that exhibit coherence, progression, and adherence to characters' emotional development. This work proposes a story generation pipeline to co-create visual stories with the users. The pipeline allows the user to control events and emotions on the generated content. The pipeline includes two parts: narrative and image generation. For narrative generation, the system generates the next sentence using user-specified keywords and emotion labels. For image generation, diffusion models are used to create a visually appealing image corresponding to each generated sentence. Further, object recognition is applied to the generated images to allow objects in these images to be mentioned in future story development.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to automatically generate coherent, progressive visual stories that are in line with the emotional development of characters based on emotions and keywords. Specifically, the author proposes an interactive visual story - generation pipeline, which allows users to control the events and emotions in the generated content, and combines image - generation techniques to create corresponding visual images for each generated sentence. ### Specific description of the problem: 1. **Coherence and progressiveness**: Traditional automatic story - generation systems often have difficulty ensuring the coherence and progressiveness of stories when generating long - form stories. This paper aims to ensure that the generated stories are not only coherent but also have reasonable progress by introducing emotions and keywords as generation prompts. 2. **Emotional consistency**: The emotional development of characters in a story is an important part of the story's attractiveness. Existing methods have shortcomings in dealing with emotional changes, resulting in generated stories that may lack emotional depth or consistency. This paper improves emotional consistency by using Plutchik's emotion wheel model to predict and control the emotional labels of each sentence. 3. **User interaction**: Many automatic story - generation systems ignore user participation, resulting in generated content that may not meet users' expectations. The system proposed in this paper allows users to provide keywords and emotional labels during the generation process, making the generated story more in line with users' intentions and creativity. 4. **Visualization**: Stories that rely solely on text generation may not be vivid enough. This paper combines image - generation techniques (such as the Diffusion model) to create corresponding images for each generated sentence, enhancing the expressiveness and immersion of the story. ### Overview of the solution: - **Emotion prediction**: Use the BERT model to predict the emotions of existing stories and generate the emotional label of the next sentence. - **Keyword extraction**: Extract keywords from the generated images for the development of subsequent stories. - **Sentence generation**: Generate the next sentence of the story based on the T5 model and the provided keywords and emotional labels. - **Image generation**: Use models such as Disco Diffusion to generate images corresponding to the generated sentences, and extract keywords in the images through object detection models (such as YOLOv5 and Faster R - CNN). Through these methods, this paper proposes an innovative, interactive visual story - generation framework, which significantly improves the quality of story generation and user experience.

Visual Story Generation Based on Emotion and Keywords

Emotion Reinforced Visual Storytelling.

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences

Image-Based Storytelling Using Deep Learning.

Knowledge-Enriched Visual Storytelling

Psychology-guided Controllable Story Generation

Controllable Multi-Character Psychology-Oriented Story Generation

CHAE: Fine-Grained Controllable Story Generation with Characters, Actions and Emotions

Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

Stylized Story Generation with Style-Guided Planning.

A Pipeline for Creative Visual Storytelling

Imagining from Images with an AI Storytelling Tool

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

"My Way of Telling a Story": Persona based Grounded Story Generation

ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer

StoryExplorer: A Visualization Framework for Storyline Generation of Textual Narratives

Emotion-Aware Scene Adaptation: A Bandwidth-Efficient Approach for Generating Animated Shorts

ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report

Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation