Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension.

Yu Gu,Zhen-Hua Ling
DOI: https://doi.org/10.21437/interspeech.2017-336
2017-01-01
Abstract:This paper presents a waveform modeling and generation method for speech bandwidth extension (BWE) using stacked dilated convolutional neural networks (CNNs) with causal or non-causal convolutional layers. Such dilated CNNs describe the predictive distribution for each wideband or high-frequency speech sample conditioned on the input narrowband speech samples. Distinguished from conventional frame-based BWE approaches. the proposed methods can model the speech waveforms directly and therefore avert the spectral conversion and phase estimation problems. Experimental results prove that the BWE methods proposed in this paper can achieve better performance than the state-of-the-art frame-based approach utilizing recurrent neural networks (RNNs) incorporating long shortterm memory (LSTM) cells in subjective preference tests.
What problem does this paper attempt to address?