Abstract:The Irish language is rich in its diversity of dialects and accents. This compounds the difficulty of creating a speech recognition system for the low-resource language, as such a system must contend with a high degree of variability with limited corpora. A recent study investigating dialect bias in Irish ASR found that balanced training corpora gave rise to unequal dialect performance, with performance for the Ulster dialect being consistently worse than for the Connacht or Munster dialects. Motivated by this, the present experiments investigate spoken dialect identification of Irish, with a view to incorporating such a system into the speech recognition pipeline. Two acoustic classification models are tested, XLS-R and ECAPA-TDNN, in conjunction with a text-based classifier using a pretrained Irish-language BERT model. The ECAPA-TDNN, particularly a model pretrained for language identification on the VoxLingua107 dataset, performed best overall, with an accuracy of 73%. This was further improved to 76% by fusing the model's outputs with the text-based model. The Ulster dialect was most accurately identified, with an accuracy of 94%, however the model struggled to disambiguate between the Connacht and Munster dialects, suggesting a more nuanced approach may be necessary to robustly distinguish between the dialects of Irish.

What problem does this paper attempt to address?

The problem this paper attempts to address is the performance disparity in Dialect Identification (DID) within Automatic Speech Recognition (ASR) systems for the Irish language. Specifically, existing ASR systems exhibit varying performance when handling different dialects, and this disparity is particularly pronounced for low-resource languages like Irish. Despite using a balanced training dataset, the recognition performance for the Ulster dialect remains significantly lower than that for the Connacht and Munster dialects. To address this issue, the authors conducted the following studies: 1. **Developing a spoken dialect identification system**: By testing two acoustic classification models (XLS-R and ECAPA-TDNN) and a text classifier based on a pre-trained Irish BERT model to identify different dialects of the Irish language. 2. **Fusing acoustic and text models**: Combining the outputs of the best acoustic model with the text model to improve the accuracy of dialect identification. 3. **Analyzing the acoustic features of different dialects**: Exploring the acoustic similarities and differences between different dialects through visualizing model embeddings. The experimental results show that the pre-trained ECAPA-TDNN model performs best in the dialect identification task, achieving an accuracy of 73%, and by fusing the acoustic and text models, the accuracy further improves to 76%. Notably, the recognition accuracy for the Ulster dialect reached 94%, but the model still struggles to distinguish between the Connacht and Munster dialects. These studies provide an important foundation for further improving the dialect identification performance of Irish ASR systems.

Towards spoken dialect identification of Irish

Low-resource speech recognition and dialect identification of Irish in a multi-task framework

Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

Investigating model performance in language identification: beyond simple error statistics

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Improving Language Identification of Accented Speech

Dialect Identification Using Spectral and Prosodic Features on Single and Ensemble Classifiers

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition

Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

Designing a System to Recognize Main Arabic Dialects

CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice

Yet Another Model for Arabic Dialect Identification

Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum

Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers

A Strategic Approach for Robust Dysarthric Speech Recognition

LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

Literary and Colloquial Dialect Identification for Tamil using Acoustic Features

Improving Speech Recognition for African American English With Audio Classification

Speaker, Accent, and Language Identification Using Multilingual Phone Strings

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages