Indonesian Automatic Speech Recognition with XLSR-53

Panji Arisaputra,Amalia Zahra
DOI: https://doi.org/10.18280/isi.270614
2023-08-20
Abstract:This study focuses on the development of Indonesian Automatic Speech Recognition (ASR) using the XLSR-53 pre-trained model, the XLSR stands for cross-lingual speech representations. The use of this XLSR-53 pre-trained model is to significantly reduce the amount of training data in non-English languages required to achieve a competitive Word Error Rate (WER). The total amount of data used in this study is 24 hours, 18 minutes, and 1 second: (1) TITML-IDN 14 hours and 31 minutes; (2) Magic Data 3 hours and 33 minutes; and (3) Common Voice 6 hours, 14 minutes, and 1 second. With a WER of 20%, the model built in this study can compete with similar models using the Common Voice dataset split test. WER can be decreased by around 8% using a language model, resulted in WER from 20% to 12%. Thus, the results of this study have succeeded in perfecting previous research in contributing to the creation of a better Indonesian ASR with a smaller amount of data.
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
This paper aims to address the issue of Automatic Speech Recognition (ASR) for Indonesian. Specifically, the researchers utilized the pre-trained model XLSR-53 to significantly reduce the amount of training data required for ASR in non-English languages and to improve the competitiveness of the Word Error Rate (WER) metric. By combining three datasets: TITML-IDN, Magic Data, and Common Voice, the researchers conducted experiments on 24 hours, 18 minutes, and 1 second of data. The results showed that without using a language model, the WER of the model was 20%, while with the use of a language model, the WER decreased to 12%. This indicates that the study successfully improved upon previous research outcomes, contributing to the creation of a better Indonesian ASR system while reducing the required data volume.