Speaker independent recognition of low-resourced multilingual Arabic spoken words through hybrid fusion
Sunakshi Mehra,Virender Ranga,Ritu Agarwal,Seba Susan
DOI: https://doi.org/10.1007/s11042-024-18804-w
IF: 2.577
2024-03-15
Multimedia Tools and Applications
Abstract:This article introduces a supervised strategy designed to enhance spoken word recognition within the constraints of a resource-limited multilingual dataset, specifically focusing on the Arabic language. Notably, existing methodologies often neglect the critical influence of morphology and phonology on the comprehension of spoken language. The Multilingual Spoken Words Corpus comprises audio files in the OPUS format. Our approach strategically employs the pre-trained Arabic Large xlsr-Wav2Vec2-53 transformer model to extract text transcripts, unfolding in two distinct forms: Buckwalter transliterations and Arabic scripts. For Buckwalter transliterations form of text transcripts, we adopt the CMU pronouncing dictionary for phonetic representation. Specifically, a specialized Arabic-based grapheme-2-phoneme model is utilized to convert Buckwalter transliterations into phonemes. Subsequently, these phonemes are transformed into vectors through the application of FastText's character n-gram-based subword embeddings. Shifting focus to the Arabic script form, a stemming process is applied, followed by further conversion into unigrams. Once again, FastText word embeddings are harnessed to represent these unigrams as vectors. To maintain uniformity, vectors are concatenated and padded across both scenarios. For classification, a three-layered dense model, augmented by batch normalization, processes the accumulated vectors, ultimately generating probabilistic scores. The final outcomes are obtained by averaging results from both forms. Comparative evaluation against the state-of-the-art (SOTA) approach substantiates the accuracy of this methodology. Crucially, our method demonstrates promising results, indicating its potential to significantly advance spoken word recognition in complex multilingual contexts.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering