Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas

Carlo Bretti,Pascal Mettes,Hendrik Vincent Koops,Daan Odijk,Nanne van Noord
DOI: https://doi.org/10.1007/978-3-031-53308-2_15
2024-01-29
Abstract:Creating a trailer requires carefully picking out and piecing together brief enticing moments out of a longer video, making it a challenging and time-consuming task. This requires selecting moments based on both visual and dialogue information. We introduce a multi-modal method for predicting the trailerness to assist editors in selecting trailer-worthy moments from long-form videos. We present results on a newly introduced soap opera dataset, demonstrating that predicting trailerness is a challenging task that benefits from multi-modal information. Code is available at
Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
This paper aims to address the problem of automatically identifying segments suitable for use in TV show trailers, particularly soap operas. Specifically, the goal of the research is to assist editors in selecting moments with high "trailerness" from longer videos to create trailers that attract viewers. To achieve this goal, the authors propose a multimodal approach to predict the "trailerness" of video segments. This method combines visual information and dialogue information and can operate on different time scales. Additionally, they have constructed a new dataset—the GTST dataset, which includes 63 episodes of the Dutch soap opera "Goede Tijden, Slechte Tijden" and their corresponding trailers. These trailers were professionally edited and can serve as a standard for training and evaluating multimodal trailerness prediction models. Through experiments, the authors demonstrate that their multimodal, multi-scale approach outperforms single-modal or single-time-scale methods in predicting the "trailerness" of video segments. The best performance is achieved when combining visual and textual modalities and fusing them at different time scales. This indicates that considering both visual appeal and dialogue content is crucial for selecting key moments in trailers. In summary, the main contribution of this paper is the proposal of an effective method to automatically identify key moments in TV show trailers and the demonstration of the method's effectiveness and practicality.