A data driven method for target and concatenation cost calculation with KL-Divergence in Mandarin hybrid speech synthesis

shanfeng liu,zhengqi wen,jianhua tao,ya li,yongguo kang
DOI: https://doi.org/10.1109/ICOSP.2014.7015069
IF: 4.729
2014-01-01
Signal Processing
Abstract:This paper presents a data driven KL-Divergence based target cost and concatenation cost calculation method for a hybrid speech synthesis with unit selection and Hidden Markov Model (HMM)-based speech synthesis. In the training stage, a set of context-dependent HMMs are estimated according to the acoustic features and label information of the database. In the synthesis stage, the pre-selection for the unit candidates is based on linear prediction model with context information and the target cost and concatenation cost are calculated with data driven method. The target cost is calculated by KL-Divergence between the context-dependent HMM and unit candidate with every state and the concatenation cost is calculated by KL-Divergence between unit candidate with the first and the last states. The mean and the variance of unit candidate for KL-Divergence calculation are estimated from original speech data which is different from context-dependent HMMs. The experiments show that the proposed method achieves a better performance than traditional hybrid unit selection system.
What problem does this paper attempt to address?