Time-frequency Complex Mask Network for Echo Cancellation and Noise Suppression.

Ning Sun,Hongqing Liu,Lu Gan,Yu Zhao,Zhen Luo,Yi Zhou
DOI: https://doi.org/10.1109/hpcc-dss-smartcity-dependsys57074.2022.00336
2022-01-01
Abstract:This work investigates the integration of traditional methods and deep learning technique in acoustic echo cancellation (AEC) application. To that end, the generalized cross correlation (GCC) algorithm is explored to align the far-end signal and the echo in the near-end microphone signal, and the echo is estimated in the near-end microphone signal by a use of adaptive filtering. After that, both the error signal and estimated echo are sent to a time-frequency complex mask neural network (TFCN) to suppress residual echo and environmental noise. In TFCN, the dual channel signal of error and estimated echo are processed by LSTMs in frequency domain, and by concatenating the resulting signal, it is converted back to time-domain. Finally, in time domain, a convolutional network is utilized to produce the final target speech. Experimental results show that the proposed framework is robust to blind test set, and effectively removes echo and noise, and achieves an excellent performance in AECMOS scores. The subjective average score of the proposed method is 4.41, which is 0.54 higher than the INTERSPEECH2021 AEC-Challen2e baseline.
What problem does this paper attempt to address?