An Excitation Model Based On Inverse Filtering For Speech Analysis And Synthesis

Zhengqi Wen,Jianhua Tao
DOI: https://doi.org/10.1109/MLSP.2011.6064574
2011-01-01
Abstract:Speech Synthesized in LPC-like vocoders suffered from a typical buzz problem. It is mostly due to the fact that the excitation is either a pulse train or a white Gaussian noise. In this paper, a new excitation model is proposed to reconstruct residual signal derived from inverse filtering. A residual frame of two-pitch periods length is intercepted to do spectrum analysis in every speech frame. Amplitude spectrum of only half of pitch period length is preserved in synthesis stage and zero-phase criterion is used to synthesize the excitation frame. Then the excitation signal is constructed by pitch-synchronous overlapping method (PSOLA). Speech synthesized by this excitation model can give a CMOS of 1.56 compared to the traditional excitation model. After that Mel Generalization Cepstrum (MGC) and LBG algorithm are adopted to manipulate the amplitude spectrum of proposed excitation model. MSE distortion and listening test showed that LBG algorithm is better than MGC to compress the amplitude spectrum.
What problem does this paper attempt to address?