ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts

Vishruth Veerendranath,Vibha Masti,Utkarsh Gupta,Hrishit Chaudhuri,Gowri Srinivasa
DOI: https://doi.org/10.1145/3639856.3639891
2024-01-13
Abstract:Film scores are considered an essential part of the film cinematic experience, but the process of film score generation is often expensive and infeasible for small-scale creators. Automating the process of film score composition would provide useful starting points for music in small projects. In this paper, we propose a two-stage pipeline for generating music from a movie script. The first phase is the Sentiment Analysis phase where the sentiment of a scene from the film script is encoded into the valence-arousal continuous space. The second phase is the Conditional Music Generation phase which takes as input the valence-arousal vector and conditionally generates piano MIDI music to match the sentiment. We study the efficacy of various music generation architectures by performing a qualitative user survey and propose methods to improve sentiment-conditioning in VAE architectures.
Multimedia,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
This paper proposes a solution to the problem of automatic music generation for movie scripts. The existing process of film scoring is expensive and not suitable for small-scale creators. ScripTONES is a two-stage pipeline system that generates music based on the emotions in the script. The first stage is emotion analysis, extracting the emotions of scenes from the script text and encoding them into a valence-arousal continuous space. The second stage is conditional music generation, where a valence-arousal vector is inputted to generate piano MIDI music that matches the emotions of the scenes. Different music generation architectures based on Transformer and VAE are studied, and an improved method for emotion conditionalizing VAE is proposed. The model's effectiveness is evaluated through user surveys, demonstrating that the generated music can match the emotions of movie scenes well. Future work will explore more architectures to enhance the model's music generation ability.