Denoising Recurrent Neural Network for Deep Bidirectional Lstm Based Voice Conversion

Jie Wu,Dongyan Huang,Lei Xie,Haizhou Li
DOI: https://doi.org/10.21437/interspeech.2017-694
2017-01-01
Abstract:The paper studies the post processing in deep bidirectional Long Short-Term Memory (DBLSTM) based voice conversion, where the statistical parameters are optimized to generate speech that exhibits similar properties to target speech. However, there always exists residual error between converted speech and target one. We reformulate the residual error problem as speech restoration, which aims to recover the target speech samples from the converted ones. Specifically, we propose a denoising recurrent neural network (DeRNN) by introducing regularization during training to shape the distribution of the converted data in latent space. We compare the proposed approach with global variance (GV), modulation spectrum (MS) and recurrent neural network (RNN) based postfilters, which serve a similar purpose. The subjective test results show that the proposed approach significantly outperforms these conventional approaches in terms of quality and similarity.
What problem does this paper attempt to address?