F0 Transformation for Emotional Speech Synthesis Using Target Approximation Features and Bidirectional Associative Memories

Zhenhua Ling,Li Gao,Lirong Dai
DOI: https://doi.org/10.11784/tdxbz201507028
2015-01-01
Abstract:In this paper,an F0 transformation method for emotional speech synthesis was proposed.Quantitative target approximation(qTA)features were used to represent F0 contour in syllable level.And Gaussian directional as-sociative memories(GBAM)was used to complete the transformation of syllable-level qTA parameters from synthe-sized neutral speech to target emotional recordings.In the training stage,firstly HMM-based statistical parametric speech synthesis was used to construct a neutral speech synthesis system with neutral corpus as training set.And then,with a small amount of emotional recording data,GBAM-based transformation model was trained by using the qTA parameters extracted from synthesized neutral speech corresponding to the emotional text as the source feature and the qTA parameters extracted from target emotional recordings as the target patterns of GBAM,respectively.In the generation of emotional speech,the trained GBAM model was utilized to complete the transformation of syllable-level F0 features from synthesized neutral speech to target emotional recordings.The experiment results indicate that,in the case of little emotional recording data,the proposed method performs better in emotional expressivity than the adaptation method using maximum likelihood linear regression(MLLR).
What problem does this paper attempt to address?