Towards Automatic Textual Summarization of Movies

Chang Liu,Mark Last,Armin Shmilovici
DOI: https://doi.org/10.1007/978-3-030-47124-8_39
2020-01-01
Abstract:With the rapidly increasing number of online video resources, the ability of automatically understanding those videos becomes more and more important, since it is almost impossible for people to watch all of the videos and provide textual descriptions. The duration of online videos varies in a extremely wide range, from several seconds to more than 5 h. In this paper, we focus on long videos, especially on full-length movies, and propose the first pipeline for automatically generating textual summaries of such movies. The proposed system takes an entire movie as input (including subtitles), splits it into scenes, generates a one-sentence description for each scene and summarizes those descriptions and subtitles into a final summary. In our initial experiment on a popular cinema movie (Forrest Gump), we utilize several existing algorithms and software tools for implementing the different components of our system. Most importantly, we use the S2VT (Sequence to Sequence—Video to Text) algorithm for scene description generation and MUSEEC (MUltilingual SEntence Extraction and Compression) for extractive text summarization. We present preliminary results from our prototype experimental framework. An evaluation of the resulting textual summaries for a movie made of 156 scenes demonstrates the feasibility of the approach—the summary contains the descriptions of three out of the four most important scenes/storylines in the movie. Although the summaries are far from satisfactory, we argue that the current results can be used to prove the merit of our approach.
What problem does this paper attempt to address?