Spectral Modeling Using Neural Autoregressive Distribution Estimators for Statistical Parametric Speech Synthesis

Xiang Yin,Zhen-Hua Ling,Li-Rong Dai
DOI: https://doi.org/10.1109/icassp.2014.6854317
2014-01-01
Abstract:This paper describes a new approach which utilizes neural autoregressive distribution estimators (NADE) for the spectral modeling in statistical parametric speech synthesis. In order to alleviate the over-smoothing effect on the generated spectral structures, a restricted Boltzmann machine (RBM) modeling method has been proposed in our previous work, where the RBM is adopted to represent the joint distribution of high-dimensional and physically meaningful spectral envelopes. However, the RBM can not provide a tractable partition function even in a moderate size. In this paper, we introduce NADE to model the distribution of mel-cepstra and spectral envelopes at each HMM state considering its simplicity in evaluating the probability of given observations. At the stage of synthesis, the spectral parameters derived from the mode of each context-dependent NADE are used to replace the Gaussian mean vector in the parameter generation process. Experimental results show that the NADE is able to model the distribution of the spectral features with better accuracy than the RBM model. Furthermore, our proposed method improves the naturalness of the conventional HMM-based speech synthesis system using mel-cepstra significantly and outperforms the RBM-based spectral modeling.
What problem does this paper attempt to address?