Voice Conversion Using Conditional Restricted Boltzmann Machine

Fengyun Zhu,Ziye Fan,Xihong Wu
DOI: https://doi.org/10.1109/chinasip.2014.6889212
2014-01-01
Abstract:In this paper, we proposed a new method for voice conversion using conditional restricted Boltzmann machine (Conditional RBM, CRBM). The joint distribution of source and target acoustic features are modeled by the RBM part of the model. Short-term temporal constraints are introduced by conditioning on contextual frames, say, the past and future frames of the source speaker. In contrast to conventional methods, temporal structure of the data could be modeled without using dynamic features. Objective and subjective experiments were conducted to evaluate the method. Experimental results show that short-term temporal structure could be modeled well by CRBM, and the proposed method outperforms conventional joint density Gaussian mixture models based method significantly.
What problem does this paper attempt to address?