GTDNN-Based Voice Conversion Using DAEs with Binary Distributed Hidden Units

Yi-Yang Ding,Ya-Jun Hu,Zhen-Hua Ling
DOI: https://doi.org/10.1109/iscslp.2018.8706574
2018-01-01
Abstract:This paper proposes a method that adopts deep autoencoders with binary distributed hidden units (BDAE) as feature extractors in generatively trained DNNs (GTDNN) for voice conversion (VC). In this method, the source and target speakers are modeled separately by two BDAEs, and the extracted high-level features are used to train a Bernoulli bidirectional associated memory network (BBAM) for feature mapping. Then, the estimated model parameters are copied to construct a DNN for voice conversion. Compared with other neural network-based features extractors, such as deep belief networks (DBN) and conventional deep autoencoders (DAE), BDAEs can obtain a better balance between reconstruction error and the degree of binarization of hidden units. Our experimental results show that the GTDNN using BDAEs achieved better naturalness and similarity of converted speech than the ones using DBNs and conventional DAEs. Further experiments show the BDAE model for target speaker played a more important role than the one for source speaker.
What problem does this paper attempt to address?