Exploring challenge and explainable shot type classification using SAM-guided approaches

Fengtian Lu,Yuzhi Li,Feng Tian
DOI: https://doi.org/10.1007/s11760-023-02928-x
IF: 1.583
2024-01-08
Signal Image and Video Processing
Abstract:The language of film shots is an important component of cinematic narrative, as it can visually convey the story, emotions, and themes, making films a highly expressive and engaging art form. In previous methods for analyzing film shot attributes, the focus has mainly been on movements and scale with a lack of interpretable research on the results of shot type analysis. In this study, we have built a new dataset to broaden the scope of existing shot attribute analysis tasks, such as distinguishing film composition, and introduced a new task: recognizing the key objects that determine shot attributes. Specifically, we have proposed a framework that utilizes clues from the Detection Transformer (DETR) to guide the use of segment anything (SAM) for mask segmentation to classify shot attributes. To address the issue of variable quantities of key objects within shots, we have developed an adaptive weight allocation strategy that enhances network training and provides a more effective approach to handling the new task we have introduced. Additionally, we extract optical flow magnitude and angle information from each pair of frames to enhance training effectiveness. Subsequent experimental results on MovieShots and our dataset demonstrate that our proposed method surpasses all prior approaches.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?