A Two-Stage Frequency-Time Dilated Dense Network for Speech Enhancement

Xiangdong Huang,Honghong Chen,Wei Lu
DOI: https://doi.org/10.1016/j.apacoust.2022.109107
IF: 3.614
2022-01-01
Applied Acoustics
Abstract:Speech enhancement system is applied in many devices such as hearing aids. To improve speech quality retrieved from noisy observations, this paper proposes a two-stage network with the frequency-time dilated dense network (FTDDN). This improvement lies in 3 aspects. Firstly, both frequency modeling and temporal modeling are considered to optimize a time-frequency mask; Secondly, to acquire the large receptive field, dilated convolution is incorporated into 3 basic processing units: frequency-dilated convolutional unit (FDCU), time-dilated convolutional unit (TDCU), and frequency-time dilated convolutional unit (FTDCU); Thirdly, for any one of them, 12 units were densely connected to assemble a frequency-dilated dense block (FDDB), a time-dilated dense block (TDDB), or a frequency-time dilated dense block (FTDDB), all of which are combined with some feature mapping operators to build up an FTDDN. With the above considerations, high-quality speech can be retrieved via implementing information reuse and feature fusion operations on two FTDDNs in a two-stage model. Using Librispeech and VCTK data sets, we conducted several experimental comparisons between our method and the stateof-the-art speech enhancement methods, showing that our proposed model outperforms these baseline models. (c) 2022 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?