Multi-task Learning of Structured Output Layer Bidirectional LSTMS for Speech Synthesis

Runnan Li,Zhiyong Wu,Xunying Liu,Helen Meng,Lianhong Cai
DOI: https://doi.org/10.1109/icassp.2017.7953210
2017-01-01
Abstract:Recurrent neural networks (RNNs) and their bidirectional long short term memory (BLSTM) variants are powerful sequence modelling approaches. Their inherently strong ability in capturing long range temporal dependencies allow BLSTM-RNN speech synthesis systems to produce higher quality and smoother speech trajectories than conventional deep neural networks (DNNs). In this paper, we improve the conventional BLSTM-RNN based approach by introducing a multi-task learned structured output layer where spectral parameter targets are conditioned upon pitch parameters prediction. Both objective and subjective experimental results demonstrated the effectiveness of the proposed technique.
What problem does this paper attempt to address?