Lip Synchronization Model For Sinhala Language Using Machine Learning

Ruwan Weerasinghe,Dilani Ranaweera,R. Dinalankara
DOI: https://doi.org/10.1109/ICARC61713.2024.10499753
2024-02-21
Abstract:Realistic lip-synchronized animations can be produced by the appropriately timed voice and lip motions of the cartoon character. This process is called as “lip synchronization”. Building a talking face for languages such as English, Korean, and Portuguese has been subject of numerous studies. Compared to other languages, Sinhala is a low-resource language due to less contribution in these researches. This research study focuses to build a frontal view of a synthetic mouth part with smooth lip movements rather than opening and closing while speaking Sinhala sentences. The most difficult challenge is to match the basic sounds, “phonemes” with the lip movement called “visemes”. This study has been used static viseme approach by deriving twenty three (23) Sinhala viseme classes and a deep learning model has been developed to map visemes with Sinhala letters. In system implementation, text input is provided first, after which the system produces audio and the deep learning model generates a collection of visemes based on the given text. The system interface then offers three options for playing the visemes at various speeds, including fast, normal, and slow. The user interface was created in Python, and the deep learning model is integrated into the system. This model will be very helpful to use in cartoon industry to build Sinhala speaking cartoon characters and also be used to train deaf people to read lips.
Linguistics,Computer Science
What problem does this paper attempt to address?