Multimodal tweet classification in disaster response systems using transformer-based bidirectional attention model

Rani Koshy,Sivasankar Elango
DOI: https://doi.org/10.1007/s00521-022-07790-5
2022-10-01
Neural Computing and Applications
Abstract:The goal of this research is to use social media to gain situational awareness in the wake of a crisis. With the developments in information and communication technologies, social media became the de facto norm for gathering and disseminating information. We present a method for classifying informative tweets from the massive volume of user tweets on social media. Once the informative tweets have been found, emergency responders can use them to gain situational awareness so that recovery actions can be carried out efficiently. The majority of previous research has focused on either text data or images in tweets. A thorough review of the literature illustrates that text and image carry complementary information. The proposed method is a deep learning framework which utilizes multiple input modalities, specifically text and image from a user-generated tweet. We mainly focused to devise an improved multimodal fusion strategy. The proposed system has a transformer-based image and text models. The main building blocks include fine-tuned RoBERTa model for text, Vision Transformer model for image, biLSTM and attention mechanism. We put forward a multiplicative fusion strategy for image and text inputs. Extensive experiments have been done on various network architectures with seven datasets spanning different types of disasters, including wildfire, hurricane, earth-quake and flood. Several state-of-the-art approaches were surpassed by our system. It showed good accuracy in the range of 94–98%. The results showed that identifying the interaction between multiple related modalities will enhance the quality of a deep learning classifier.
computer science, artificial intelligence
What problem does this paper attempt to address?