Abstract:In this paper, we propose parameter generation methods using rich context models as yet another hybrid method combining Hidden Markov Model (HMM)-based speech synthesis and unit selection synthesis. Traditional HMM-based speech synthesis enables flexible modeling of acoustic features based on a statistical approach. However, the speech parameters tend to be excessively smoothed. To address this problem, several hybrid methods combining HMM-based speech synthesis and unit selection synthesis have been proposed. Although they significantly improve quality of synthetic speech, they usually lose flexibility of the original HMM-based speech synthesis. In the proposed methods, we use rich context models, which are statistical models that represent individual acoustic parameter segments. In training, the rich context models are reformulated as Gaussian Mixture Models (GMMs). In synthesis, initial speech parameters are generated from probability distributions over-fitted to individual segments, and the speech parameter sequence is iteratively generated from GMMs using a parameter generation method based on the maximum likelihood criterion. Since the basic framework of the proposed methods is still the same as the traditional framework, the capability of flexibly modeling acoustic features remains. The experimental results demonstrate: (1) the use of approximation with a single Gaussian component sequence yields better synthetic speech quality than the use of EM algorithm in the proposed parameter generation method, (2) the state-based model selection yields quality improvements at the same level as the frame-based model selection, (3) the use of the initial parameters generated from the over-fitted speech probability distributions is very effective to further improve speech quality, and (4) the proposed methods for spectral and $F_{0}$ components yields significant improvements in synthetic speech quality compared with the traditional HMM-based speech synthesis.

Acoustic statistical modeling based new generation speech synthesis technology

Acoustic Statistical Modeling Based Speech Synthesis Technologies

Statistical Acoustic Model Based Unit Selection Algorithm for Speech Synthesis

USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method

The USTC and iFlytek Speech Synthesis Systems for Blizzard Challenge 2007

Trainable Unit Selection Speech Synthesis under Statistical Framework

Integrating Articulatory Features into HMM-Based Parametric Speech Synthesis

Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis

Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis

Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework

Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score

Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

Study about Chinese Speech Synthesis Algorithm and Acoustic Model Based on Wireless Communication Network

Voiced/unvoiced Decision Algorithm for HMM-based Speech Synthesis

Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis.

New synthesis method based on LMA vocal tract model

The USTC System for Blizzard Challenge 2008

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks