Abstract:This paper presents a new spectral modeling method for statistical parametric speech synthesis. In the conventional methods, high-level spectral parameters, such as mel-cepstra or line spectral pairs, are adopted as the features for hidden Markov model (HMM)-based parametric speech synthesis. Our proposed method described in this paper improves the conventional method in two ways. First, distributions of low-level, un-transformed spectral envelopes (extracted by the STRAIGHT vocoder) are used as the parameters for synthesis. Second, instead of using single Gaussian distribution, we adopt the graphical models with multiple hidden variables, including restricted Boltzmann machines (RBM) and deep belief networks (DBN), to represent the distribution of the low-level spectral envelopes at each HMM state. At the synthesis time, the spectral envelopes are predicted from the RBM-HMMs or the DBN-HMMs of the input sentence following the maximum output probability parameter generation criterion with the constraints of the dynamic features. A Gaussian approximation is applied to the marginal distribution of the visible stochastic variables in the RBM or DBN at each HMM state in order to achieve a closed-form solution to the parameter generation problem. Our experimental results show that both RBM-HMM and DBN-HMM are able to generate spectral envelope parameter sequences better than the conventional Gaussian-HMM with superior generalization capabilities and that DBN-HMM and RBM-HMM perform similarly due possibly to the use of Gaussian approximation. As a result, our proposed method can significantly alleviate the over-smoothing effect and improve the naturalness of the conventional HMM-based speech synthesis system using mel-cepstra.

Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis

Amplitude Spectrum Based Excitation Model For Hmm-Based Speech Synthesis

Inverse Filtering Based Harmonic Plus Noise Excitation Model for HMM-Based Speech Synthesis

An Excitation Model Based On Inverse Filtering For Speech Analysis And Synthesis

Pitch-scaled Analysis Based Residual Reconstruction for Speech Analysis and Synthesis

Optimization of Pitch Preprocessing in TETRA Speech Encoder

A Novel HTS System Using both Continuous HMMs and Discrete HMMs

Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis

Investigation of the Spectral Envelope Estimation Vocoder and Improved Pitch Estimation Based on the Sinusoidal Speech Model

Modulation Spectrum Compensation For Hmm- Based Speech Synthesis Using Line Spectral Pairs

An initial research: Towards accurate pitch extraction for speech synthesis based on BLSTM

Statistical Modification Based Post-Filtering Technique for HMM-based Speech Synthesis

HMM based speech synthesis with Global Variance Training method

Prosody Modification for Vocoder Based on Amplitude Spectrum of Residual Signal

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram

Pitch Preservation In Singing Voice Synthesis

Modeling Glottal Effect On The Spectral Envelop Of Straight Using Mixture Of Gaussians

Global Variance Modeling on the Log Power Spectrum of LSPs for HMM-based Speech Synthesis

DCT_M Model for Excitation Parameter in Low Bit Rate Vocoder

Improving HMM Based Speech Synthesis by Reducing Over-Smoothing Problems

A Novel Hmm-Based Tts System Using Both Continuous Hmms And Discrete Hmms