Voice Conversion Using Dynamic Features for High Quality Transformation.

Wei Wang,Zhen Yang
DOI: https://doi.org/10.1117/12.855168
2010-01-01
Abstract:A novel voice morphing method is proposed to make the speech of the source speaker sound like the voice uttered by a target speaker. This method is based on the Gaussian Mixture Model (GMM). However, the traditional GMM has the over-smoothed phenomenon and may get discontinuity of the converted speech due to the inaccuracy of the extracted feature information. In order to overcome it, we consider the dynamic spectral features between frames. The conversion function is also modified to deal with the discontinuities. The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrogram (STRAIGHT) algorithm is adopted for the analysis and synthesis process. Objective and perceptual experiments show that the quality of the speech converted by our proposed method is significantly improved compared with the traditional GMM method.
What problem does this paper attempt to address?