Multi-modal news event detection with external knowledge
Zehang Lin,Jiayuan Xie,Qing Li
DOI: https://doi.org/10.1016/j.ipm.2024.103697
IF: 7.466
2024-05-01
Information Processing & Management
Abstract:News event detection involves the identification and categorization of significant happenings or occurrences from social media data. Recent work has typically relied on datasets collected solely based on event-related keywords. However, datasets collected with such keywords tend to oversimplify the task. They reduce the contribution of non-text modalities and do not fully capture real-world scenarios where news events involve diverse expressions and media forms. To address these limitations, we introduce a News Event Detection (NED) dataset, comprising 17,366 posts with text-image pairs from Twitter, annotated with 40 real-world events. Unlike previous datasets, our NED dataset is collected based on users’ hashtags rather than keywords. This method captures more diverse event-related content and avoids the limitations of keyword searches. Additionally, we propose a Multi-modal Fusion with External Knowledge (MFEK) model to address the out-of-distribution (OOD) issue commonly encountered in news event detection. The MFEK model features a text enrichment module that leverages image semantic information to enhance the textual content, a knowledge extraction module that extracts explicit and implicit external knowledge to mitigate the OOD issue, and a knowledge-aware feature fusion module that employs a co-attention mechanism to integrate external knowledge, text, and images, while filtering out irrelevant information. Extensive experiments validate the superior performance of our MFEK model on the NED dataset for the task of news event detection, achieving a 5.48% increase in the F1 score compared to the current state-of-the-art model. The NED dataset is available at https://github.com/RetrainIt/NED.
computer science, information systems,information science & library science