Using bidirectional associative memories for joint spectral envelope modeling in voice conversion

Li-Juan Liu,Ling-Hui Chen,Zhen-Hua Ling,Li-Rong Dai
DOI: https://doi.org/10.1109/ICASSP.2014.6855135
2014-01-01
ICASSP
Abstract:The spectral envelope is the most natural representation of speech signal. But in voice conversion, it is difficult to directly model the raw spectral envelope space, which is high dimensional and strongly cross-dimensional correlated, with conventional Gaussian distributions. Bidirectional associative memory (BAM) is a two-layer feedback neural network that can better model the cross-dimensional correlations in high dimensional vectors. In this paper, we propose to reformulate BAMs as Gaussian distributions in order to model the spectral envelope space. The parameters of BAMs are estimated using the contrastive divergence algorithm. The evaluations on likelihood show that BAMs have better modeling ability than Gaussians with diagonal covariance. And the subjective tests on voice conversion indicate that the performance of the proposed method is significantly improved comparing with the conventional GMM based method.
What problem does this paper attempt to address?