Towards Automated Movie Trailer Generation

Dawit Mureja Argaw,Mattia Soldan,Alejandro Pardo,Chen Zhao,Fabian Caba Heilbron,Joon Son Chung,Bernard Ghanem
2024-04-04
Abstract:Movie trailers are an essential tool for promoting films and attracting audiences. However, the process of creating trailers can be time-consuming and expensive. To streamline this process, we propose an automatic trailer generation framework that generates plausible trailers from a full movie by automating shot selection and composition. Our approach draws inspiration from machine translation techniques and models the movies and trailers as sequences of shots, thus formulating the trailer generation problem as a sequence-to-sequence task. We introduce Trailer Generation Transformer (TGT), a deep-learning framework utilizing an encoder-decoder architecture. TGT movie encoder is tasked with contextualizing each movie shot representation via self-attention, while the autoregressive trailer decoder predicts the feature representation of the next trailer shot, accounting for the relevance of shots' temporal order in trailers. Our TGT significantly outperforms previous methods on a comprehensive suite of metrics.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper proposes a solution to the problem of automatic movie trailer generation. Traditional methods for creating trailers are time-consuming and expensive, requiring expert knowledge. The researchers introduce an automatic trailer generation framework, inspired by machine translation techniques, which models movies and trailers as sequences of shots, transforming trailer generation into a sequence-to-sequence task. They introduce Trailer Generation Transformer (TGT), a deep learning framework consisting of an encoder-decoder architecture that performs contextual processing on movie shots and automatically generates feature representations for the next trailer shot, taking into account the temporal order of shots in the trailer. Compared to previous binary classification or ranking methods, TGT performs better on multiple metrics, considering shot combinations and predicting continuous feature representations instead of simple binary classification. The paper also constructs a large-scale dataset and designs new benchmarks to foster further research in the field of automatic trailer generation.