Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID

Ahmad Alfani Handoyo,Chung Tran,Dessi Puji Lestari,Sakriani Sakti
DOI: https://doi.org/10.1109/O-COCOSDA64382.2024.10800604
2024-12-26
Abstract:Multilingual text-to-speech systems convert text into speech across multiple languages. In many cases, text sentences may contain segments in different languages, a phenomenon known as code-switching. This is particularly common in Indonesia, especially between Indonesian and English. Despite its significance, no research has yet developed a multilingual TTS system capable of handling code-switching between these two languages. This study addresses Indonesian-English code-switching in STEN-TTS. Key modifications include adding a language identification component to the text-to-phoneme conversion using finetuned BERT for per-word language identification, as well as removing language embedding from the base model. Experimental results demonstrate that the code-switching model achieves superior naturalness and improved speech intelligibility compared to the Indonesian and English baseline STEN-TTS models.
Computation and Language,Artificial Intelligence,Sound,Audio and Speech Processing
What problem does this paper attempt to address?