The TJU-Didi-Huiyan System for Blizzard Challenge 2019

Ju Zhang,Shaotong Guo,Cheng Gong,Shuaiting Chen,Yuguang Wang,Longbiao Wang,Wei Zou,Xiangang Li
DOI: https://doi.org/10.21437/blizzard.2019-18
2019-01-01
Abstract:In this paper, we introduce an end-to-end text-to-speech system based on Tacotron 2 for Blizzard Challenge 2019. The main aim of our system is to synthesis voice as similar as possible to the voice provided by the real male speaker. In the front-end, we convert the Chinese character sequences to Pinyin sequences with tone and prosody annotation. In the back-end, the Tacotron 2 model is adapted for predicting spectrogram features. Then, the predicted spectrograms are used to generate 16-bit speech waveforms by Griffin-lim algorithm. This is the first time for us to join the Blizzard Challenge, and the identifier for our system is X. Experimental results in subjective listening tests show that our system performed well on the naturalness test compared with merlin benchmark.
What problem does this paper attempt to address?