Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy

Xuming Hu
DOI: https://doi.org/10.1145/3539618.3591790
2023-07-18
Abstract:Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) are tasks in information retrieval that aim to recognize entities and extract relations among them using information from multiple modalities, such as text and images. Although current methods have attempted a variety of modality fusion approaches to enhance the information in text, a large amount of readily available internet retrieval data has not been considered. Therefore, we attempt to retrieve real-world text related to images, objects, and entire sentences from the internet and use this retrieved text as input for cross-modal fusion to improve the performance of entity and relation extraction tasks in the text.
Computer Science
What problem does this paper attempt to address?