The Tacotron2-based IPA-to-Speech speech synthesis system.

Jueting Liu,Yu Lei,Minda Yao,Zemeng Liu,Guoyuan Lin,Zehua Wang,Yingchun Liu,Wei Chen
DOI: https://doi.org/10.1145/3614008.3614019
2023-01-01
Abstract:To help language learners better understanding the pronunciation of one language, in this paper, we proposed an IPA-to-Speech speech synthesis system which aims to generate high quality human speech from written language in IPA format. There are mainly two parts in our system: a Transformer-based G2P converter and a Tacotron2-based speech synthesis system. The purpose of the G2P converter is to build the training data, all the English sentences in LJSpeech can be converted into their IPA formats by this converter, and the speech synthesis module intend to generate the speech from IPA sentences. The word error rate and phoneme error rate were utilized to evaluate the G2P converter and the mean opinion score was used to evaluate the performance of the speech synthesis. Also, this work inspired us to use the IPA format represent the dialects, in the future work, we will continue this research on the dialect recognition and generation.
What problem does this paper attempt to address?