Abstract:Yemen and Syria are suffering from the worst humanitarian crisis in the world. Since 2016, 80% of the population in Yemen are dying from hunger, and 3,886 died from cholera. While since 2011, 65% of the Syrian population have become refugees. During these crises, people from both countries turned to Twitter to convey their crisis-related messages. Humanitarian organizations have realized the effectiveness of gathering, analyzing, and classifying tweets' contents to enhance their crisis rescue plan. However, most of the available crisis resources are either in the English language or cover hazards and natural disasters only. Also, there is a lack of knowledge of the most common terms used for crisis description by Arabic users. So, organizations found it difficult to gather, annotate, preprocess, extract features, and classifying Arabic crisis tweets content. As a result, there is a delay in responding to famine, cholera, and refugee crisis and a lot of loss in lives. The paper aims to proposed methodologies for extracting unique crisis terms, building annotation criteria, and enhancing classification for crisis-related messages in the Arabic language. Also, we produced a humanity crisis corpus for classifying tweets in Arabic. For that, we used keywords from each topic produced by the LDA model to collect crisis tweets. Then, we built crisis annotation criteria guided by a unique word list generated from word embedding models. Finally, we combined features from topics, words, and sentences then implemented by supervised methods for classification. Results indicate that our proposed methods enhance the classification model's performance. Besides, it increases the classifier's ability to detect more positive crisis classes to the right label. On the other hand, this paper provides humanitarian organizations with tools and methods for Arabic crisis-messages classification in social media and opens new opportunities for future studies in crisis management.

CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response

CrisisMMD: Multimodal Twitter Datasets from Natural Disasters

Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages

MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification

CrisisTransformers: Pre-trained language models and sentence encoders for crisis-related social media texts

Robust Training of Social Media Image Classification Models for Rapid Disaster Response

Detecting and Classifying Humanitarian Crisis in Arabic Tweets

Zero-Shot Classification of Crisis Tweets Using Instruction-Finetuned Large Language Models

A deep learning-based social media text analysis framework for disaster resource management

Rapid Classification of Crisis-Related Data on Social Networks using Convolutional Neural Networks

NADBenchmarks -- a compilation of Benchmark Datasets for Machine Learning Tasks related to Natural Disasters

CReMa: Crisis Response through Computational Identification and Matching of Cross-Lingual Requests and Offers Shared on Social Media

CrisisViT: A Robust Vision Transformer for Crisis Image Classification

Classifying Crises-Information Relevancy with Semantics

Classification for Crisis-Related Tweets Leveraging Word Embeddings and Data Augmentation.

A Social Context-aware Graph-based Multimodal Attentive Learning Framework for Disaster Content Classification during Emergencies

DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank

Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

Arabic Twitter Corpus for Crisis Response Messages Classification

COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions