Abstract:Generative AI has demonstrated unprecedented creativity in the field of computer vision, yet such phenomena have not been observed in natural language processing. In particular, large language models (LLMs) can hardly produce written works at the level of human experts due to the extremely high complexity of literature writing. In this paper, we present HoLLMwood, an automated framework for unleashing the creativity of LLMs and exploring their potential in screenwriting, which is a highly demanding task. Mimicking the human creative process, we assign LLMs to different roles involved in the real-world scenario. In addition to the common practice of treating LLMs as ${Writer}$, we also apply LLMs as ${Editor}$, who is responsible for providing feedback and revision advice to ${Writer}$. Besides, to enrich the characters and deepen the plots, we introduce a role-playing mechanism and adopt LLMs as ${Actors}$ that can communicate and interact with each other. Evaluations on automatically generated screenplays show that HoLLMwood substantially outperforms strong baselines in terms of coherence, relevance, interestingness and overall quality.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the inadequacies of large language models (LLMs) in literary creation, particularly in screenwriting. Despite generative AI demonstrating unprecedented creativity in artistic creation, especially in the field of computer vision, this phenomenon has not yet been observed in natural language processing, particularly in literary writing. Specifically, current LLMs struggle to produce literary works comparable to those of human experts due to the high complexity of literary writing. The paper proposes an automated framework named **HOLLM WOOD** that unleashes the creativity of LLMs through a role-playing mechanism and explores their potential in the demanding task of screenwriting. This framework simulates the human creative process by assigning LLMs to different roles, such as **Writer**, **Editor**, and **Actors**. These roles are responsible for writing the story, providing feedback and revision suggestions, and enriching character dialogues and interactions through role-playing. ### Main Contributions 1. **Experimental results reveal the difficulty of LLMs in generating high-quality literary works under simple guidance**: - Particularly in generating scripts with vivid characters and engaging plots, LLMs perform poorly. - Dialogues and interactions often appear mechanical and dull, indicating the challenges of directly applying LLMs to creative tasks. 2. **Proposes a fully automated screenwriting framework HOLLM WOOD**: - This framework not only enables non-professionals to create engaging scripts but also provides auxiliary tools for industry professionals. - Users only need to provide an initial storyline, and the framework can automatically handle complex tasks, democratizing a field traditionally requiring extensive experience and specific skills. 3. **Experimental evaluation shows the superiority of HOLLM WOOD in multiple dimensions**: - Using GPT-4 for pairwise comparison, results show that scripts generated by HOLLM WOOD significantly outperform other methods in coherence, relevance, and interestingness. - Ablation experiments further demonstrate the positive contribution of the feedback-revision mechanism and role-playing mechanism to the final script quality. ### Experimental Setup and Results - **Dataset**: Initial storylines of different movie genres synthesized by LLMs were used as input, including 6 types: romance, sci-fi, horror, drama, crime, and comedy, with 10 examples generated for each type, totaling 60 instances. - **Baseline Methods**: Including Plan-then-Write and DOC-screen methods, which generate scripts for each episode sequentially based on designed roles and outlines, or generate story chapters using DOC and then generate scripts. - **Evaluation**: Pairwise comparison using GPT-4, evaluating scripts from four dimensions: coherence, relevance, interestingness, and overall quality. Results show that HOLLM WOOD significantly outperforms baseline methods in all dimensions, especially in interestingness and overall quality. ### Conclusion HOLLM WOOD significantly enhances the performance of LLMs in screenwriting tasks through role-playing and feedback-revision mechanisms, enabling them to generate high-quality scripts. This framework not only helps non-professionals create engaging works but also provides powerful auxiliary tools for industry professionals. In the future, as the capabilities of foundational models improve, this framework is expected to generate scripts approaching human-level quality.

HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing

"It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play

Evaluating Large Language Model Creativity from a Literary Perspective

Steering Large Language Models to Evaluate and Amplify Creativity

On the Creativity of Large Language Models

Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation

Assessing and Understanding Creativity in Large Language Models

Analyzing Nobel Prize Literature with Large Language Models

Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment

Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers

Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals

RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models

The HaLLMark Effect: Supporting Provenance and Transparent Use of Large Language Models in Writing with Interactive Visualization

Large Language Models show both individual and collective creativity comparable to humans

Tackling Vision Language Tasks Through Learning Inner Monologues

Characterising the Creative Process in Humans and Large Language Models

Are Large Language Models Capable of Generating Human-Level Narratives?

Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming