Abstract:In this paper, we analyze the impact of five Arabic dialects on the front-end and pronunciation dictionary components of an Automatic Speech Recognition (ASR) system. We use ASR's phonetic decision tree as a diagnostic tool to compare the robustness of MFCC and MLP front-ends to dialectal variations in the speech data and found that MLP Bottle-Neck features are less robust to such variations. We also perform a rule-based analysis of the pronunciation dictionary, which enables us to identify dialectal words in the vocabulary and automatically generate pronunciations for unseen words. We show that our technique produces pronunciations with an average phone error rate 9.2%. Arabic language is characterized by its multitude of dialects. Although Modern Standard Arabic (MSA) is used in writing, TV/radio broadcasts and for formal communication, all informal communication is typically carried out in one of the regional dialects of Arabic. Dialectal variations influence the pronunciation dictionary, acoustic and language models in an ASR. Previous works on dialectal Arabic ASR include cross- dialectal data sharing (1), improved pronunciation and language modeling (2, 3), etc. In this paper, we describe our experiments on a dialectal Arabic speech database, where we focus on analyzing the behavior of different front-ends and pronunciation dictionary due to dialectal variations between speakers. We evaluate Mel-Frequency Cepstral Coefficients (MFCC) and Multi-Layer Perceptrons (MLP), on their ability to handle these variations that arise due to different dialects. Extending our previous work on gender normalization (4), we use phonetic decision trees as a diagnostic tool to analyze the influence of dialect in the clustered models. We introduce questions pertaining to dialect in addition to context in the building of the decision tree. We then build the tree to cluster the contexts and calculate the number of leaves that belong to branches with dialectal questions. The ratio of such 'dialectal' models to the total model size is used as a measure for dialect normalization. The higher the ratio, the more models are affected by the dialect, hence less normalization and vice versa. We further extend our analysis to the pronunciation dictionary, where we investigate ways to generate rule-based pronunciations for unseen words in a dialect with minimum manual effort. Our setup features a 'Pan-Arabic' dictionary, which contains pronunciations typically found in five Arabic dialects. We analyze the pronunciation variants in our common dictionary using acoustic model alignments to derive the dialect-specific pronunciations for each word. This forms the source of our rule-learning algorithm which maps word pronunciations from one dialect to another. These rules are then used to generate pronunciations for unseen words and the accuracy is estimated.

Hybrid approaches for automatic vowelization of Arabic texts

Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic

A Comparative Study of Some Automatic Arabic Text Diacritization Systems

Towards an Optimal Solution to Lemmatization in Arabic

An ensemble-based framework for mispronunciation detection of Arabic phonemes

Effective Deep Learning Models for Automatic Diacritization of Arabic Text

Automatic Dialect Detection in Arabic Broadcast Speech

VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

Dynamic specification of vowels in Hijazi Arabic

Developing a New Approach for Arabic Morphological Analysis and Generation

A Hybrid Deep Learning Model for Arabic Text Recognition

Speaker independent recognition of low-resourced multilingual Arabic spoken words through hybrid fusion

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Automatic Arabic Dialect Identification Systems for Written Texts: A Survey

The Role of Vowelization in Reading Comprehension of Different Arabic Genres

An Expert System for Automatic Reading of A Text Written in Standard Arabic

Arabic dialect identification in social media: A hybrid model with transformer models and BiLSTM

A prototype system for handwritten sub-word recognition: Toward Arabic-manuscript transliteration

Analysis of Dialectal Influence in Pan-Arabic ASR.

A classification benchmark for Arabic alphabet phonemes with diacritics in deep neural networks

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation