An investigation of implementation and performance analysis of DNN based speech synthesis system

Zhehuai Chen,Kai Yu
DOI: https://doi.org/10.1109/ICOSP.2014.7015070
IF: 4.729
2014-01-01
Signal Processing
Abstract:Deep Neural Network (DNN), which can model hierarchical and complex relationship between input and output layer has recently been applied in speech synthesis. However, it is remained uncertain why DNN outperform traditional HMM-based synthesis. This paper describes several implementation details of DNN-based speech synthesis system and compares different impacting factors, e.g, F0 modeling method and adding BAP feature. DNN-based system are further investigated and in particular Continuous F0 HMM (CF-HMM) is taken as the baseline to compare with DNN-based system, as it has more similar input and output features with DNN-based system. Results show the ability of F0 modelling is similar between two systems, while CF-HMM system performs better. It seems that CF-HMM carefully strengthens the model by many technology, while using DNN to model F0 is still rough and needs more research. Another experiment shows that CF-HMM also does better in mcep modelling which needs to be further investigated.
What problem does this paper attempt to address?