Transformer Based Multi-modal Memory-augmented Masked Network for Air Crisis Event Detection

Yang,Yishan Zhang,Shengsheng Qian,Minghua Zhang,Kaiquan Cai
DOI: https://doi.org/10.1109/itsc57777.2023.10422016
2023-01-01
Abstract:In the social media era where the public is living surrounded by information in multiple modalities (e.g., images, videos, and texts), any air crisis event has the potential to draw global attention and provoke panic and anxiety among the public, making air crisis detection the core module of air accident management. This paper introduces the air crisis detection issue with a novel multi-modal formulation and proposes a Transformer based multi-modal memory-augmented masked network. Firstly, we leverage fine-grained features from both images and texts while filtering out redundant information across the modalities without solely relying on pairwise similarities, so as to achieve better event classification performance. Secondly, a specific multi-modal dataset with texts and images collected from Twitter for this issue is constructed, named Air-CrisisMMD, which could be used for benchmark testing. The experimental results on the two datasets (the Air-CrisisMMD and the publicly available CrisisMMD) showed that the proposed achieves 93.39% and 96.67% detection accuracy, outperforming the state-of-the-art baseline methods for crisis event detection, which demonstrates the potential of the proposed to provide an effective tool for improving accident management strategies and supporting aviation safety.
What problem does this paper attempt to address?