The evaluation of a code-switched Sepedi-English automatic speech recognition system

Amanda Phaladi,Thipe Modipa

2024-03-11

Abstract:Speech technology is a field that encompasses various techniques and tools used to enable machines to interact with speech, such as automatic speech recognition (ASR), spoken dialog systems, and others, allowing a device to capture spoken words through a microphone from a human speaker. End-to-end approaches such as Connectionist Temporal Classification (CTC) and attention-based methods are the most used for the development of ASR systems. However, these techniques were commonly used for research and development for many high-resourced languages with large amounts of speech data for training and evaluation, leaving low-resource languages relatively underdeveloped. While the CTC method has been successfully used for other languages, its effectiveness for the Sepedi language remains uncertain. In this study, we present the evaluation of the Sepedi-English code-switched automatic speech recognition system. This end-to-end system was developed using the Sepedi Prompted Code Switching corpus and the CTC approach. The performance of the system was evaluated using both the NCHLT Sepedi test corpus and the Sepedi Prompted Code Switching corpus. The model produced the lowest WER of 41.9%, however, the model faced challenges in recognizing the Sepedi only text.

Audio and Speech Processing,Computation and Language,Machine Learning

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to develop an automatic speech recognition (ASR) system for code - switching between Sepedi, a low - resource language, and English, and evaluate its performance. Specifically, the researchers focus on how to effectively apply the Connectionist Temporal Classification (CTC) method in Sepedi - English code - switching scenarios to improve the accuracy of the ASR system. Since Sepedi is a low - resource language and lacks a large amount of training data, this research aims to explore the effectiveness of using the CTC method in this situation and optimize the model performance by adjusting the number of filters in the convolutional layer. The paper mentions that existing ASR technologies mainly focus on high - resource languages, and for low - resource languages like Sepedi, their development is relatively lagging. Therefore, the research in this paper is not only of great significance for the development of ASR technology for the Sepedi language, but also provides a reference for the development of ASR systems for other low - resource languages. The researchers hope that by constructing an efficient code - switching ASR system, they can promote the application of speech recognition technology in multilingual environments, such as in the fields of language learning, speech - to - text, and voice - controlled devices.

The evaluation of a code-switched Sepedi-English automatic speech recognition system

Design and implementation of a speaker recognition system

Transformer-Transducers for Code-Switched Speech Recognition

Building a Unified Code-Switching ASR System for South African Languages

Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation

Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition

Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

Using Voice Technologies to Support Disabled People

Semantically Corrected Amharic Automatic Speech Recognition

Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition

Evaluation of Noise Reduction Methods for Sentence Recognition by Sinhala Speaking Listeners

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Racial disparities in automated speech recognition

A Survey of Code-switched Speech and Language Processing

Tri-stage training with language-specific encoder and bilingual acoustic learner for code-switching speech recognition

AequeVox: Automated Fairness Testing of Speech Recognition Systems

TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline

Improving Low Resource Code-switched ASR using Augmented Code-switched TTS