Abstract:A novel spectral modeling method for statistical parametric speech synthesis using a hidden trajectory model (HTM) is presented in this paper. An HTM is a structured generative model with a two-stage implementation. First hidden formant trajectories are generated from time-aligned formant target sequences by a bidirectional filter. This target-filtering model could provide a correlation structure across temporal frames and describe the effect of co-articulation on speech signals efficiently. Then the observed cepstral features are constituted by a formant-related component and a residual component. The formant-related component is predicted from hidden formant trajectories using a nonlinear and analytical function, and the prediction residuals are modeled by context-dependent Gaussians. In this paper, we apply HTM-based acoustic modeling to speech synthesis and investigate the effectiveness of this method in improving the naturalness and controllability of synthetic speech. Experimental results show that this proposed method can improve the accuracy of spectral feature prediction and the naturalness of synthetic speech compared with the conventional HMM-based method, especially for the conditions where the amount of training data is limited. Furthermore, this method can achieve effective controllability on vowel quality and formant sharpness of synthetic speech by conveniently manipulating the distribution parameters for the phone-dependent targets of formant frequencies and bandwidths. (C) 2015 Elsevier B.V. All rights reserved.

Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis.

Evaluation of parameter generation using high order dynamic features and long span windows for HMM based speech synthesis

Integrating Articulatory Features into HMM-Based Parametric Speech Synthesis

Minimum Kullback–Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis

HMM-based Unit Selection Using F

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.

Articulatory Control of HMM-based Parametric Speech Synthesis Driven by Phonetic Knowledge

HMM-based Unit Selection Using Frame Sized Speech Segments.

Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis.

Optimization Method for Unit Selection Speech Synthesis Based on Synthesis Quality Predictions

Latent Correlation Analysis of HMM Parameters for Speech Recognition

Voiced/unvoiced Decision Algorithm for HMM-based Speech Synthesis

Statistical Acoustic Model Based Unit Selection Algorithm for Speech Synthesis

Statistical Parametric Speech Synthesis Using a Hidden Trajectory Model

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis

HMM-based Emphatic Speech Synthesis for Corrective Feedback in Computer-Aided Pronunciation Training

Learning Kernel-based HMMs for Dynamic Sequence Synthesis

Cross-Stream Dependency Modeling for HMM-Based Speech Synthesis

Parametric model of introducing inter-frame correlation information into hidden markov model for speech recognition

DNN-based Stochastic Postfilter for HMM-based Speech Synthesis

A Novel HTS System Using both Continuous HMMs and Discrete HMMs