Single Channel Speech Enhancement Based on Temporal Convolutional Network

Chao Li,Ting Jiang,Jiacheng Yu
DOI: https://doi.org/10.1109/icsip52628.2021.9688611
2021-01-01
Abstract:Aiming at the problem of high-frequency artifacts in the speech waveform estimation, a single channel enhancement algorithm based on temporal convolutional network (TCN) model with the modulation domain auxiliary features (Mod-TCN) is proposed. Specifically, the deep learning model is composed of the encoder, the TCN module, the decoder, and the auxiliary feature extractor of modulation domain. Among them, both the encoder and decoder are used to transform the potential representation of speech waveform. The causal and dilated convolutional mechanism of the TCN module is equipped with skip-connection and residual-connection to accelerate the convergence speed of the model. Furthermore, the auxiliary feature extractor of modulation domain is composed of multiple depth separable convolutions, which is mainly used for automatic learning of modulation domain features. Experimental results show that the proposed Mod-TCN model can give consistently better enhancement results than the existing TCN model by combining the features of modulation domain and time domain.
What problem does this paper attempt to address?