VIEMF: Multimodal metaphor detection via visual information enhancement with multimodal fusion
Xiaoyu He,Long Yu,Shengwei Tian,Qimeng Yang,Jun Long,Bo Wang
DOI: https://doi.org/10.1016/j.ipm.2024.103652
IF: 7.466
2024-01-25
Information Processing & Management
Abstract:In this paper, we study multimodal metaphor detection to obtain real semantic meaning from multiple heterogeneous information sources. The existing approaches mainly suffer from two drawbacks. (1) They focus on textual aspects, overlooking the characteristics of visual metaphor information. (2) Efficient methods for fusing multimodal metaphor features are lacking. To address the first issue, we propose a visual information enhancement method based on dual-granularity visual feature fusion , obtaining complete metaphorical visual features. To achieve bidirectional interaction among multimodal metaphor features, we further develop a multi-interactive crossmodal residual network (MCRN) that fuses the consistent and complementary information between different modalities and design a progressive fusion strategy to enhance the iterative fusion ability of the model. We extensively evaluate the proposed method on the popular Met-meme metaphor detection benchmark, outperforming the existing state-of-the-art methods by a large margins; i.e., we achieve F1 score improvements ranging from 1.47% to 2.55% under different languages. In addition, we further extend the evaluation to the Sarcasm dataset to validate the ability of the model to perceive semantic contrasts and meaning transformations, and the experimental results are superior to those of a strong baseline model.
computer science, information systems,information science & library science