Summarization of Films and Documentaries Based on Subtitles and Scripts

Marta Aparício,Paulo Figueiredo,Francisco Raposo,David Martins de Matos,Ricardo Ribeiro,Luís Marujo
DOI: https://doi.org/10.1016/j.patrec.2015.12.016
2016-03-10
Abstract:We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well-known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles.
Computation and Language,Artificial Intelligence,Information Retrieval
What problem does this paper attempt to address?
The problem this paper attempts to address is the evaluation of the performance of general text summarization algorithms in generating summaries for movies and documentaries. Specifically, the researchers use data extracted from news articles, movie scripts and subtitles, as well as documentary subtitles to test the performance of these algorithms, and compare them with human summaries using standard ROUGE metrics. The main goal of the paper is to understand the quality of these automatic summarization algorithms when processing movies and documentaries, particularly in comparison to their performance when processing news articles. In this way, the researchers hope to explore the impact of different types of media content (such as news, movies, and documentaries) on the performance of summarization algorithms.