The IMU Speech Synthesis Entry for Blizzard Challenge 2019

Rui Liu,Jingdong Li,Feilong Bao,Guanglai Gao
DOI: https://doi.org/10.21437/blizzard.2019-7
2019-01-01
Abstract:This paper describes the IMU speech synthesis entry for Blizzard Challenge 2019, where the task was to build a voice from Mandarin audio data. Our system is a typical end-to-end speech synthesis system. The acoustic parameters is modeled by “Tacotron” model, and the vocoder is Griffin-Lim algorithm. In the synthesis stage, the task is divided into the following parts: 1) segment long sentence into short sentences by comma; 2) predict interjection labels of each words in short sentences; 3) predict prosodic break labels of each word in short sentences; 4) generate corresponding synthesis speech for each short sentence which enriched by prosodic break labels and interjections; 5) concatenate short sentences into an entire long sentence. The Blizzard Challenge listening test results show that the proposed system achieves unsatisfactory performance. The problems in the system are also discussed in this paper.
What problem does this paper attempt to address?