Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social Media

Liang He,Hongke Wang,Zhen Wu,Jianbing Zhang,Xinyu Dai,Jiajun Chen
DOI: https://doi.org/10.1145/3664647.3680995
2024-01-01
Abstract:Multimedia content's surge on the internet has made multimodal relation extraction vital for applications like intelligent search and knowledge graph construction. As a rich source of image-text data, social media plays a crucial role in populating knowledge bases. However, the noisy information present in social media poses a challenge in multimodal relation extraction. Current methods focus on extracting relevant information from images to improve model performance but often overlook the importance of global image information. In this paper, we propose a novel multimodal relation extraction method FocalMRE, which leverages image focal augmentation, focal attention, and gating mechanisms. FocalMRE enables the model to concentrate on the image's focal regions while effectively utilizing the global information in the image. Through gating mechanisms, FocalMRE optimizes the multimodal fusion strategy, allowing the model to select the most relevant augmented regions for overcoming noise interference in relation extraction. The experimental results on the public MNRE dataset reveal that FocalMRE exhibits robust and significant performance advantages in the multimodal relation extraction task, especially in scenarios with high noise, long-tail distributions, and limited resources. The code is available at https://github.com/NJUNLP/FocalMRE.
What problem does this paper attempt to address?