Abstract:In light of the pandemic, the identification and processing of COVID-19-related text have emerged as critical research areas within the field of Natural Language Processing (NLP). With a growing reliance on online portals and social media for information exchange and interaction, a surge in online textual content, comprising disinformation, misinformation, fake news , and rumors has led to the phenomenon of an infodemic on the World Wide Web. Arabic, spoken by over 420 million people worldwide, stands as a significant low-resource language, lacking efficient tools or applications for the detection of COVID-19-related text. Additionally, the identification of COVID-19 text is an essential prerequisite task for detecting fake and toxic content associated with COVID-19. This gap hampers crucial COVID information retrieval and processing necessary for policymakers and health authorities. Addressing this issue, this paper introduces an intelligent Arabic COVID-19 text identification system named 'AraCovTexFinder,' leveraging a fine-tuned fusion-based transformer model. Recognizing the challenges posed by a scarcity of related text corpora, substantial morphological variations in the language, and a deficiency of well-tuned hyperparameters, the proposed system aims to mitigate these hurdles. To support the proposed method, two corpora are developed: an Arabic embedding corpus (AraEC) and an Arabic COVID-19 text identification corpus (AraCoV). The study evaluates the performance of six transformer-based language models (mBERT, XML-RoBERTa, mDeBERTa-V3, mDistilBERT, BERT-Arabic, and AraBERT), 12 deep learning models (combining Word2Vec, GloVe, and FastText embedding with CNN, LSTM, VDCNN, and BiLSTM), and the newly introduced model AraCovTexFinder. Through extensive evaluation, AraCovTexFinder achieves a high accuracy of 98.89 ± 0.001%, outperforming other baseline models, including transformer-based language and deep learning models. This research highlights the importance of specialized tools in low-resource languages to combat the infodemic relating to COVID-19, which can assist policymakers and health authorities in making informed decisions.

Comparing Open Arabic Named Entity Recognition Tools

A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends

Recent Advances in Named Entity Recognition: A Comprehensive Survey and Comparative Study

Empirical Evaluation of Leveraging Named Entities for Arabic Sentiment Analysis

mucAI at WojoodNER 2024: Arabic Named Entity Recognition with Nearest Neighbor Search

A Comparative Study of Deep Learning Approaches for Arabic Language Processing

A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

Utilizing Large Language Models for Named Entity Recognition in Traditional Chinese Medicine against COVID-19 Literature: Comparative Study

Ensemble learning approach for distinguishing human and computer-generated Arabic reviews

Annotation and evaluation of a dialectal Arabic sentiment corpus against benchmark datasets using transformers

A Benchmark Evaluation of Multilingual Large Language Models for Arabic Cross-Lingual Named-Entity Recognition

AraCovTexFinder: Leveraging the transformer-based language model for Arabic COVID-19 text identification

Arabic dialect identification in social media: A hybrid model with transformer models and BiLSTM

Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

Ensemble of Deep Masked Language Models for Effective Named Entity Recognition in Health and Life Science Corpora

MAF - A General Matching and Alignment Framework for Multimodal Named Entity Recognition.

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task

SANA : Sentiment Analysis on Newspapers comments in Algeria

DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem

English-Arabic Transliteration

Evaluating Arabic Emotion Recognition Task Using ChatGPT Models: A Comparative Analysis between Emotional Stimuli Prompt, Fine-Tuning, and In-Context Learning