Screenplay Summarization Using Latent Narrative Structure

Pinelopi Papalampidi,Frank Keller,Lea Frermann,Mirella Lapata
DOI: https://doi.org/10.48550/arXiv.2004.12727
2020-04-27
Abstract:Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront. As a result, such models are biased on position and often perform a smart selection of sentences from the beginning of the document. When summarizing long narratives, which have complex structure and present information piecemeal, simple position heuristics are not sufficient. In this paper, we propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models. We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays (i.e., extract an optimal sequence of scenes). Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode and improve summarization performance over general extractive algorithms leading to more complete and diverse summaries.
Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is that existing extractive summarization models are primarily trained on news articles, which are usually short and present all important information at the beginning. As a result, these models tend to select sentences from the beginning of the document, and this position-based heuristic approach is not suitable for long narrative texts (such as scripts) because long narrative texts have complex structures and information is presented gradually. To address this, the authors propose a method that explicitly incorporates narrative structure into general unsupervised and supervised extractive summarization models to generate more complete and diverse script summaries. Specifically, the authors address this problem in the following ways: 1. **Introducing Narrative Structure**: Formalizing the narrative structure as key narrative events (turning points) and treating them as latent variables. 2. **Improving Summarization Algorithms**: Enhancing existing extractive summarization algorithms by incorporating narrative structure to improve the quality of summaries for long narrative texts. 3. **Experimental Validation**: Conducting experiments on the CSI TV series script dataset to verify the improvement in summarization performance after incorporating narrative structure. Through these methods, the authors hope that the generated summaries can better capture the key events of the story, thereby more completely and diversely reflecting the content of the scripts.