Arabic Twitter Corpus for Crisis Response Messages Classification

Ghadah Adel,Yuping Wang
DOI: https://doi.org/10.1145/3377713.3377799
2019-01-01
Abstract:Twitter has been used intensively by Arab users after the crisis caused by the "Arab Spring". Humanity crisis in countries such as Yemen and Syria led people to utilize Twitter as a crisis communication tool to overcome uncertainty. A variety of crisis-related messages is sent to enquire about different issues. The content of these messages is used by humanitarian organizations to form the response plan. However, most of these messages are in the Arabic language that covers information about the humanitarian crisis. Classifying their content relay on annotated corpus. Though, most of the crisis corpora available are in the English language and focusing on natural hazards. In this paper, we present the first humanity crisis corpus in the Arabic language using Twitter data which annotated data into five-crisis categories. Then, classification was performed through three different experiments using TFIDF, lexical, morphology and word embedding features. After that supervised learning algorithms were trained with and without word2vec models. Results indicate that classifiers are detecting more Arabic crisis-message classes when they are trained with skip-gram with TFIDF, lexical and morphology extracted features. Moreover, we generated Arabic crisis word embedding with similarity weighting trained by word2vec techniques in each crisis category. This paper is a baseline for Arabic crisis-message classification in social media and opens up prospects for future inspiring ideas in this field.
What problem does this paper attempt to address?