Evaluating Creative Short Story Generation in Humans and Large Language Models

Mete Ismayilzada,Claire Stevenson,Lonneke van der Plas
2024-11-05
Abstract:Storytelling is a fundamental aspect of human communication, relying heavily on creativity to produce narratives that are novel, appropriate, and surprising. While large language models (LLMs) have recently demonstrated the ability to generate high-quality stories, their creative capabilities remain underexplored. Previous research has either focused on creativity tests requiring short responses or primarily compared model performance in story generation to that of professional writers. However, the question of whether LLMs exhibit creativity in writing short stories on par with the average human remains unanswered. In this work, we conduct a systematic analysis of creativity in short story generation across LLMs and everyday people. Using a five-sentence creative story task, commonly employed in psychology to assess human creativity, we automatically evaluate model- and human-generated stories across several dimensions of creativity, including novelty, surprise, and diversity. Our findings reveal that while LLMs can generate stylistically complex stories, they tend to fall short in terms of creativity when compared to average human writers.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate whether the creativity of large language models (LLMs) in generating short stories can be comparable to that of ordinary humans. Specifically, the researchers focus on the following aspects: 1. **Dimensions of Creativity**: Including novelty, surprise, and diversity. These dimensions are used to measure whether the generated stories are innovative, unexpected, and rich in content. 2. **Task Setup**: A five - sentence creative short - story - generation task was used, which is a method commonly used in psychology to evaluate human creativity. Participants need to write a 4 - 5 - sentence short story based on three prompt words. 3. **Comparison Objects**: It is not limited to professional writers, but the stories generated by LLMs are compared with those of ordinary people. This helps to more comprehensively understand the performance of LLMs in creative writing. 4. **Evaluation Method**: Quantify creativity through automated evaluation metrics to ensure the objectivity and repeatability of the evaluation process. ### Research Background - **Importance of Story Creation**: Storytelling is a core part of human communication and especially relies on creativity to produce novel, appropriate, and surprising narratives. - **Advances and Limitations of LLMs**: Although LLMs have demonstrated the ability to generate high - quality stories, their creativity has not been fully explored. - **Deficiencies of Existing Research**: Most previous studies either focused on creativity tests requiring brief responses or mainly compared the performance of models with professional writers, without involving ordinary people. ### Research Objectives This study aims to systematically analyze the differences in creativity between short stories generated by LLMs and ordinary people, especially to reveal the advantages and disadvantages of LLMs in creative writing through multi - dimensional creativity evaluation metrics. The research results show that although LLMs can generate complex stories, they are still inferior to ordinary people in terms of novelty, diversity, and surprise. ### Main Findings - **Complexity vs. Creativity**: The stories generated by LLMs are more complex in terms of vocabulary and syntax, but have lower readability; while human stories are more novel, surprising, and diverse. - **Impact of Semantic Distance**: When the semantic distance between prompt words is small, the stories generated by humans and LLMs are more novel. - **Pronoun Use**: Humans are more inclined to write from the first - or second - person perspective, while LLMs prefer the third - person. Through these findings, the researchers hope to provide guidance for future research on improving the creativity of LLMs and emphasize the importance of comprehensive evaluation of creativity.