Abstract:In daily communications, Arabs use local dialects which are hard to identify automatically using conventional classification methods. The dialect identification challenging task becomes more complicated when dealing with an under-resourced dialects belonging to a same county/region. In this paper, we start by analyzing statistically Algerian dialects in order to capture their specificities related to prosody information which are extracted at utterance level after a coarse-grained consonant/vowel segmentation. According to these analysis findings, we propose a Hierarchical classification approach for spoken Arabic algerian Dialect IDentification (HADID). It takes advantage from the fact that dialects have an inherent property of naturally structured into hierarchy. Within HADID, a top-down hierarchical classification is applied, in which we use Deep Neural Networks (DNNs) method to build a local classifier for every parent node into the hierarchy dialect structure. Our framework is implemented and evaluated on Algerian Arabic dialects corpus. Whereas, the hierarchy dialect structure is deduced from historic and linguistic knowledges. The results reveal that within {\HD}, the best classifier is DNNs compared to Support Vector Machine. In addition, compared with a baseline Flat classification system, our HADID gives an improvement of 63.5% in term of precision. Furthermore, overall results evidence the suitability of our prosody-based HADID for speaker independent dialect identification while requiring less than 6s test utterances.

USTHB at NADI 2023 shared task: Exploring Preprocessing and Feature Engineering Strategies for Arabic Dialect Identification

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task

Multi-Dialect Arabic BERT for Country-Level Dialect Identification

dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through Dialect Identification using Transformer-based Approach

ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task

Exploiting Dialect Identification in Automatic Dialectal Text Normalization

Arabic Dialect Identification in the Wild

Hierarchical Classification for Spoken Arabic Dialect Identification using Prosody: Case of Algerian Dialects

Bridging the Kuwaiti Dialect Gap in Natural Language Processing

AraFinNLP 2024: The First Arabic Financial NLP Shared Task

Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach

Automatic Dialect Detection in Arabic Broadcast Speech

MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

OSACT4 Shared Task on Offensive Language Detection: Intensive Preprocessing-Based Approach

Arabic dialect identification in social media: A hybrid model with transformer models and BiLSTM

LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

Towards Zero-Shot Text-To-Speech for Arabic Dialects