Abstract:A novel spectral modeling method for statistical parametric speech synthesis using a hidden trajectory model (HTM) is presented in this paper. An HTM is a structured generative model with a two-stage implementation. First hidden formant trajectories are generated from time-aligned formant target sequences by a bidirectional filter. This target-filtering model could provide a correlation structure across temporal frames and describe the effect of co-articulation on speech signals efficiently. Then the observed cepstral features are constituted by a formant-related component and a residual component. The formant-related component is predicted from hidden formant trajectories using a nonlinear and analytical function, and the prediction residuals are modeled by context-dependent Gaussians. In this paper, we apply HTM-based acoustic modeling to speech synthesis and investigate the effectiveness of this method in improving the naturalness and controllability of synthetic speech. Experimental results show that this proposed method can improve the accuracy of spectral feature prediction and the naturalness of synthetic speech compared with the conventional HMM-based method, especially for the conditions where the amount of training data is limited. Furthermore, this method can achieve effective controllability on vowel quality and formant sharpness of synthetic speech by conveniently manipulating the distribution parameters for the phone-dependent targets of formant frequencies and bandwidths. (C) 2015 Elsevier B.V. All rights reserved.

A Novel HTS System Using both Continuous HMMs and Discrete HMMs

A Novel Hmm-Based Tts System Using Both Continuous Hmms And Discrete Hmms

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

Amplitude Spectrum Based Excitation Model For Hmm-Based Speech Synthesis

HMM based speech synthesis with Global Variance Training method

An HMM-based Cantonese speech synthesis system

Statistical Modification Based Post-Filtering Technique for HMM-based Speech Synthesis

Inverse Filtering Based Harmonic Plus Noise Excitation Model for HMM-Based Speech Synthesis

Improving HMM Based Speech Synthesis by Reducing Over-Smoothing Problems

Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis

An Unified and Automatic Approach of Mandarin HTS System.

High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Statistical Parametric Speech Synthesis Using a Hidden Trajectory Model

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

Formant-Controlled HMM-Based Speech Synthesis.

Voiced/unvoiced Decision Algorithm for HMM-based Speech Synthesis

Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism

Evaluation of parameter generation using high order dynamic features and long span windows for HMM based speech synthesis

Speech Synthesis Based on Gaussian Conditional Random Fields

Cross-stream Dependency Modeling Using Continuous F0 Model for HMM-based Speech Synthesis

Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters