Amplitude Spectrum Based Excitation Model For Hmm-Based Speech Synthesis

Zhengqi Wen,Jianhua Tao
DOI: https://doi.org/10.21437/interspeech.2012-365
2012-01-01
Abstract:This paper describes an excitation model based on amplitude spectrum for hidden Markov model (HMM)-based speech synthesis system (HTS). Residual signal obtained from inverse filtering is decomposed into periodic and aperiodic spectrums in frequency domain. Amplitude spectrum of half pitch period length is reserved as periodic component in synthesis stage and zero-phase criterion and pitch synchronous overlap add method (PSOLA) are adopted to reconstruct the residual signal. Before integrating this excitation model into HTS, these periodic spectrums are normalized and Linde-Buzo-Gray (LBG) algorithm is adopted to construct codebooks for every Mandarin final(1). Then index parameters from these codebooks which are indicated as excitation information are taken into HTS training together with spectral, F0 and aperiodic parameters. Listening test showed that for female voice the analysis-synthesis result of the vocoder based on proposed excitation model is comparable with that of STRAIGHT and when integrating into HTS, the quality of generated speech is also improved.
What problem does this paper attempt to address?