MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

Alejandro Pardo,Fabian Caba Heilbron,Juan León Alcázar,Ali Thabet,Bernard Ghanem
DOI: https://doi.org/10.48550/arXiv.2109.05569
2022-10-24
Abstract:Understanding movies and their structural patterns is a crucial task in decoding the craft of video editing. While previous works have developed tools for general analysis, such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the Cut type recognition task, which requires modeling multi-modal information. To ignite research in this new task, we construct a large-scale dataset called MovieCuts, which contains 173,967 video clips labeled with ten cut types defined by professionals in the movie industry. We benchmark a set of audio-visual approaches, including some dealing with the problem's multi-modal nature. Our best model achieves 47.7% mAP, which suggests that the task is challenging and that attaining highly accurate Cut type recognition is an open research problem. Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others. Our data and code are publicly available: <a class="link-external link-https" href="https://github.com/PardoAlejo/MovieCuts" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/PardoAlejo/MovieCuts" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of automatic identification of the **Cut (cutting)** type, which is the most fundamental but crucial editing technique in film editing. Specifically, the paper introduces a new task: **cut - type identification**, and constructs a large - scale dataset MovieCuts for this purpose. #### Main problem background: 1. **Understanding of film editing**: - Understanding films and their structural patterns is crucial for decoding video - editing techniques. Although previous research has developed tools for general analysis, such as detecting characters or identifying shot - level cinematographic properties, relatively little research has been done on understanding the most basic video - editing operation - the cut. 2. **Importance of the cut**: - The cut is the most commonly used transition method in films. It is not just a simple switch between two shots, but has specific purposes and meanings. Different types of cuts can convey different semantic information, such as emotional expression, change of perspective, etc. 3. **Deficiencies of existing tools**: - Although there are some tools that can analyze the cut frequency in films (such as Cinemetrics), challenges still remain in understanding the semantic meaning of these cuts. Existing computer vision methods mainly focus on shot - level analysis, and the automatic identification and understanding of cut types have not been fully studied. #### Specific objectives of the paper: - **Introducing the cut - type identification task**: Formally define and study this new task of cut - type identification, requiring the model to be able to handle multimodal information (audio and video). - **Constructing a large - scale dataset**: Create a large - scale dataset named MovieCuts, which contains 173,967 video segments with professionally - annotated cut - type labels, covering ten different cut types. - **Establishing a benchmark model**: Implement multiple audio - visual baseline models and establish a benchmark for the cut - type identification task. The experimental results show that the mAP (mean Average Precision) of the best model is 47.7%, indicating that this task is challenging and requires further research. #### Potential applications: - **Education and analysis**: Automated cut - type identification can assist in film analysis and education, providing a deeper understanding of film - editing styles. - **Video re - editing**: Support application scenarios such as virtual cinematography, machine - assisted trailer generation, and machine - assisted video editing. By solving these problems, this research provides new tools and methods for the automated understanding and analysis of film editing, promoting the progress in the field of video editing.