Word Embedding For Recurrent Neural Network Based Tts Synthesis

Peilu Wang,Yao Qian,Frank K. Soong,Lei He,Hai Zhao
DOI: https://doi.org/10.1109/ICASSP.2015.7178898
2015-01-01
Abstract:The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents word with low dimensional continuous-valued vector and being assumed to carry a certain syntactic and semantic information, for bidirectional long short term memory (BLSTM), recurrent neural network (RNN) based TTS synthesis. Experimental results show that word embedding can significantly improve the performance of BLSTM-RNN based TTS synthesis without using features of TOBI and Part of Speech (POS).
What problem does this paper attempt to address?