Abstract:This paper presents a realistic visual speech synthesis based on the hybrid concatenation method. Unlike previous methods based on phoneme level unit selection or hidden Markov model (HMM), etc., the hybrid concatenation method uses a frame level-based unit selection method combined with a fused HMM, and is able to generate more expressive and stable facial animations. The fused HMM can be used to explicitly model the loose synchronization of tightly coupled streams, with much better results than a normal HMM for audiovisual mapping. After fused HMM is created, facial animation is generated via the unit selection method at the frame level by using the fused HMM output probabilities. To accelerate the computing efficiency of the unit selection on a large corpus, this paper also proposes a two-layer Viterbi search method in which only the subsets that have been selected in the first layer are further checked in the second layer. Using this idea, the system has been successfully integrated into real-time applications. Furthermore, the paper also proposes a mapping method to generate emotional facial expressions from neutral facial expressions based on Gaussian mixture models (GMMs). Final experiments prove that the method described can output synthesized facial parameters with high quality. Compared with other audiovisual mapping methods, our method has better performance with respect to expressiveness, stability, and system running speed.

Voiced/unvoiced Decision Algorithm for HMM-based Speech Synthesis

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis

Statistical Acoustic Model Based Unit Selection Algorithm for Speech Synthesis

A Hierarchical Viterbi Algorithm For Mandarin Hybrid Speech Synthesis System

Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech

HMM-based Unit Selection Speech Synthesis Using Log Likelihood Ratios Derived from Perceptual Data

An unvoiced/voiced duration adjustment algorithm based on context features in mandarin TTS

Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score

A state duration generation algorithm considering global variance for HMM-based speech synthesis

Stable boundary-based non-uniform unit selection in speech synthesis

Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis

HMM-Based Hierarchical Unit Selection Combining Kullback-Leibler Divergence with Likelihood Criterion

Voiced/Unvoiced Parameters Recovery Based on Second-Order Hidden Markov Model

HMM-based Unit Selection Using F

Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis.

Optimization Method for Unit Selection Speech Synthesis Based on Synthesis Quality Predictions

Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method

A Novel Hybrid Approach for Mandarin Speech Synthesis

Improved Recovery Algorithm for Unvoiced/voiced Parameters Based on GMM

A Novel HTS System Using both Continuous HMMs and Discrete HMMs

Global Variance Modeling on Frequency Domain Delta LSP for HMM-based Speech Synthesis