Applicability of End-to-End Deep Neural Architecture to Sinhala Speech Recognition

Buddhi Gamage,Randil Pushpananda,Thilini Nadungodage,Ruvan Weerasinghe
DOI: https://doi.org/10.4038/icter.v17i1.7273
2024-05-31
International Journal on Advances in ICT for Emerging Regions (ICTer)
Abstract:This research presents a study on the application of end-to-end deep learning models for Automatic Speech Recognition in the Sinhala language, which is characterized by its high inflection and limited resources.We explore two e2e architectures, namely the e2e Lattice-Free Maximum Mutual Information model and the Recurrent Neural Network model, using a restricted dataset. Statistical models with 40 hours of training data are established as baselines for evaluation. Our pretrained endto-end Automatic Speech Recognition models achieved a Word Error Rate of 23.38% by far the best word-error-rate achieved for low resourced Sinhala Language. Our models demonstrate greater contextual independence and faster processing, making them more suitable for general-purpose speech-to-text translation in Sinhala.
What problem does this paper attempt to address?