DBN-based Spectral Feature Representation for Statistical Parametric Speech Synthesis

Ya-Jun Hu,Zhen-Hua Ling
DOI: https://doi.org/10.1109/lsp.2016.2516032
2016-01-01
IEEE Signal Processing Letters
Abstract:This letter presents a method of deriving spectral features using a deep belief network (DBN) for hidden Markov model (HMM)-based parametric speech synthesis. At training time, a DBN is estimated to represent the high-dimensional spectral envelopes and then transforms them into binary codes. These DBN-based binary codes (DBCs) are used as spectral features for HMM modeling. At synthesis time, spectral envelopes are recovered from the predicted DBC sequences and then used for waveform reconstruction. Experimental results show that our proposed method can achieve better naturalness than the conventional method using mel-cepstra as spectral features and considering global variance (GV) during parameter generation.
What problem does this paper attempt to address?