Abstract:In this paper, we analyze the impact of five Arabic dialects on the front-end and pronunciation dictionary components of an Automatic Speech Recognition (ASR) system. We use ASR's phonetic decision tree as a diagnostic tool to compare the robustness of MFCC and MLP front-ends to dialectal variations in the speech data and found that MLP Bottle-Neck features are less robust to such variations. We also perform a rule-based analysis of the pronunciation dictionary, which enables us to identify dialectal words in the vocabulary and automatically generate pronunciations for unseen words. We show that our technique produces pronunciations with an average phone error rate 9.2%. Arabic language is characterized by its multitude of dialects. Although Modern Standard Arabic (MSA) is used in writing, TV/radio broadcasts and for formal communication, all informal communication is typically carried out in one of the regional dialects of Arabic. Dialectal variations influence the pronunciation dictionary, acoustic and language models in an ASR. Previous works on dialectal Arabic ASR include cross- dialectal data sharing (1), improved pronunciation and language modeling (2, 3), etc. In this paper, we describe our experiments on a dialectal Arabic speech database, where we focus on analyzing the behavior of different front-ends and pronunciation dictionary due to dialectal variations between speakers. We evaluate Mel-Frequency Cepstral Coefficients (MFCC) and Multi-Layer Perceptrons (MLP), on their ability to handle these variations that arise due to different dialects. Extending our previous work on gender normalization (4), we use phonetic decision trees as a diagnostic tool to analyze the influence of dialect in the clustered models. We introduce questions pertaining to dialect in addition to context in the building of the decision tree. We then build the tree to cluster the contexts and calculate the number of leaves that belong to branches with dialectal questions. The ratio of such 'dialectal' models to the total model size is used as a measure for dialect normalization. The higher the ratio, the more models are affected by the dialect, hence less normalization and vice versa. We further extend our analysis to the pronunciation dictionary, where we investigate ways to generate rule-based pronunciations for unseen words in a dialect with minimum manual effort. Our setup features a 'Pan-Arabic' dictionary, which contains pronunciations typically found in five Arabic dialects. We analyze the pronunciation variants in our common dictionary using acoustic model alignments to derive the dialect-specific pronunciations for each word. This forms the source of our rule-learning algorithm which maps word pronunciations from one dialect to another. These rules are then used to generate pronunciations for unseen words and the accuracy is estimated.

Automatic Dialect Detection in Arabic Broadcast Speech

Arabic Dialect Identification in the Wild

Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

A Three-Stage Neural Model for Arabic Dialect Identification.

Designing a System to Recognize Main Arabic Dialects

Arabic dialect identification in social media: A hybrid model with transformer models and BiLSTM

Hierarchical Classification for Spoken Arabic Dialect Identification using Prosody: Case of Algerian Dialects

On the Robustness of Arabic Speech Dialect Identification

Analysis of Dialectal Influence in Pan-Arabic ASR.

Dialectal Coverage And Generalization in Arabic Speech Recognition

Automatic Standardization of Arabic Dialects for Machine Translation

Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic

Multi-Dialect Arabic BERT for Country-Level Dialect Identification

ALDi: Quantifying the Arabic Level of Dialectness of Text

Yet Another Model for Arabic Dialect Identification

Speech recognition challenge in the wild: Arabic MGB-3

Casablanca: Data and Models for Multidialectal Arabic Speech Recognition

Maghrebian dialect recognition based on support vector machines and neural network classifiers

Exploiting Dialect Identification in Automatic Dialectal Text Normalization