Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus

Linh Thi Thuc Tran,Han-Gyu Kim,Hoang Minh La,Su Van Pham
DOI: https://doi.org/10.3390/electronics13050977
IF: 2.9
2024-03-05
Electronics
Abstract:Vietnamese is an under-resourced language. The requirement for a large-scale and high-quality Vietnamese speech corpus increases on demand. We introduce a new large-scale Vietnamese speech corpus with 100.5 h collected from various audio sources in the Internet. The raw collected audio was processed to obtain clean speech. Transcription of the clean speech was made manually. The new corpus was analyzed in terms of gender, topic and regional dialect. Results shows that the new corpus has good diversity of genders, topics and regional dialects. We also evaluated the new corpus using state-of-the-art automatic speech recognition models like LAS and Speech-Transformer for multiple scenarios. This is the first time that these models have been applied to Vietnamese speech recognition and obtained reasonable results. Simulation results showed that the new corpus would be a good dataset for the Vietnamese ASR tasks because it reflected correctly difficulties in recognizing speech from different dialects and topic domains.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?