NUS-HLT System for Blizzard Challenge 2020

Yi Zhou,Xiaohai Tian,Xuehao Zhou,Mingyang Zhang,Grandee Lee,Riu Liu,Berrak Sisman,Haizhou Li
DOI: https://doi.org/10.21437/vcc_bc.2020-7
2020-01-01
Abstract:The paper presents the NUS-HLT text-to-speech (TTS) system for the Blizzard Challenge 2020. The challenge has two tasks: Hub task 2020-MH1 to synthesize Mandarin Chinese given 9.5 hours of speech data from a male native speaker of Mandarin; Spoke task 2020-SS1 to synthesize Shanghainese given 3 hours of speech data from a female native speaker of Shanghainese. Our submitted system combines the word embedding, which is extracted from a pre-trained language model, with the E2E TTS synthesizer to generate acoustic features from text input. WaveRNN neural vocoder and WaveNet neural vocoder are utilized to generate speech waveforms from acoustic features in MH1 and SS1 tasks, respectively. Evaluation results provided by the challenge organizers demonstrate the effectiveness of our submitted TTS system.
What problem does this paper attempt to address?