Using conditional restricted Boltzmann machines for spectral envelope modeling in speech bandwidth extension.

Yingxue Wang,Shenghui Zhao,Dan Qu,Jingming Kuang
DOI: https://doi.org/10.1109/ICASSP.2016.7472815
2016-01-01
ICASSP
Abstract:In this paper, we present a conditional restricted Boltzmann machine (CRBM) based speech bandwidth extension (BWE) method. A CRBM is employed to obtain time information and model deep non-linear relationships between the spectral envelope features of low frequency (LF) and high frequency (HF). Two exclusive CRBMs are adopted to model the distribution of LF's and HF's spectral envelope features. respectively. A neural network (NN) is then used to model the joint distribution of hidden variables extracted from the two CRBMs. The proposed method takes advantage of the strong ability of CRBM in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional Gaussian mixture model based methods and other NN based methods.
What problem does this paper attempt to address?