MFMGC: A Multi-modal Data Fusion Model for Movie Genre Classification

Xingchen Yang,Qian Zhou,Wei Chen,Lei Zhao
DOI: https://doi.org/10.1007/978-3-031-46664-9_45
2023-01-01
Abstract:Movie Genre Classification (MGC) is a classic multi-label task that aims to classify movies into different genres. Existing studies have proposed many approaches for this task based on multi-modal data (e.g., synopsis, posters, and trailer). Despite the significant contributions made by them, they usually fuse multi-modal information based on simple operations, e.g., concatenation or weighted sum, failing to effectively capture the interactive information between multi-modal data. In addition, movies with significant overlap in directors and actors tend to own the same genres. This information could potentially improve the performance of MGC, which has been ignored by previous studies. Having observed the shortcomings of existing work, we propose a Multi-modal data Fusion Model for MGC (MFMGC), including two modules: Multi-modal Data Fusion (MDF) and Movie Graph Representation Learning (MGRL). In MDF, we carefully design the fusion layer based on the attention mechanism to effectively capture the modalities’ interactive information. In MGRL, we construct a movie graph to extract the structural information between movies. Specifically, the graph is constructed based on the overlap of movies’ directors, screenwriters, and actors, and each node in the graph has multi-modal attributes. The experiments conducted on datasets Moviescope and MovieBricks demonstrate the superiority of the proposed model MFMGC over the state-of-the-art approaches.
What problem does this paper attempt to address?