Abstract:— Voice conversion system modifies a speaker’s voice to be perceived as another speaker uttered, and now it is widely used in many real applications. However, most research only focuses on one aspect performance of voice conversion system, rare theoretical analysis and experimental comparison on the whole source-target speaker voice conversion process has been introduced. Therefore, in this paper, a comprehensive analysis on source-target speaker voice conversion is conducted based on three key steps, including acoustic features selection and extraction, voice conversion model construction, and target speech synthesis, and a complete and optimal source-target speaker voice conversion is proposed. First, a simple and direct serial feature fusion form consisting of prosodic feature, spectrum parameter and spectral envelope characteristic, is proposed. Then, to void the discontinuity and spectrum distortion of a converted speech, D_GMM (Dynamic Gaussian Mixture Model) considering dynamic information between frames is presented. Subsequently, for speech synthesis, STRAIGHT algorithm synthesizer with feature combination is modified. Finally, the objective contrast experiment shows that our new source-target voice conversion process achieves better performance than the conventional methods. In addition, both objective evaluation (speaker recognition system) and subjective evaluation are used to evaluate the quality of converted speech, and experimental result shows that the converted speech has higher target speaker individuality and speech quality.

Voice Conversion Using Dynamic Features for High Quality Transformation.

Voice conversion using dynamic inter-frame features

An improved method for voice conversion based on Gaussian mixture model

GMM-based Voice Conversion with Explicit Modelling on Feature Transform

Voice Conversion Using Improved Spectral and F0 Transformation Methods.

Voice Conversion with Smoothed GMM and MAP Adaptation

Comprehensive Voice Conversion Analysis Based on DGMM and Feature Combination

Voice Conversion Based On Straight And Ubm-Gmm

A Dynamic Gaussian Process for Voice Conversion.

Voice Conversion Based on Gaussian Mixture Modules with Minimum Distance Spectral Mapping

Voice conversion based on improved GMM and spectrum with synchronous prosody

Voice Conversion with High Naturalness Using Spectrum and Super-segmental Feature Transform

Comprehensive Source-Target Speaker Voice Conversion Analysis

Voice conversion with a strategy for separating speaker individuality using state-space model

An Improved Spectral And Prosodic Transformation Method In Straight-Based Voice Conversion

Voice Conversion Using Structrued Gaussian Mixture Model

A hybrid method to convert acoustic features for voice conversion

Voice Conversion Using Deep Neural Network in Super-Frame Feature Space

Improving voice quality of HMM-based speech synthesis using voice conversion method

A Modified Method for Voice Conversion Based on GMM

Voice Conversion Based On Mapping Formants