Context features based pre-selection and weight prediction in concatenation speech synthesis system

Shanfeng Liu,Zhengqi Wen,Ya Li,Jianhua Tao,Bin Liu
DOI: https://doi.org/10.1109/ISCSLP.2014.6936611
2014-01-01
Abstract:How to generate natural-sounding synthesized speech has been challenging all the researchers in speech synthesis area. Experiments show that speech concatenated by units selected from large speech corpus has a better performance. However how to limit the searching space and predict weights when calculating target cost is an important problem. This paper presents a detailed hierarchical pre-selection method to limit the searching of space. After three layers of pre-selection, a set of units are selected as the candidate units. In order to ensure the continuity in the duration, the prediction model is used in the hierarchical pre-selection. Meanwhile, M5P algorithm which is combined with decision tree and regression is presented in this paper to predict weights needed in target cost calculation. Experimental result shows that these two approaches can generate high quality speech.
What problem does this paper attempt to address?