Improving Acoustic Echo Cancellation by Exploring Speech and Echo Affinity with Multi-Head Attention.

Yiqun Zhang,Xinmeng Xu,Weiping Tu
DOI: https://doi.org/10.1109/ICASSP48485.2024.10446389
2024-01-01
Abstract:Deep learning-based approaches formulate acoustic echo cancellation (AEC) as a supervised speech separation task, where the mixture signal and the far-end signal are combined directly before or after the encoding stage. However, the mixture signal and the far-end signal are not integrated sufficiently due to the lack of interpretability for the affinity between speech and echo in a noisy mixture. In this paper, we propose DCA-Net, a dual-branch cross-attention neural network, to improve AEC performance by exploring the affinities between speech and echo in the representation space. In particular, the two branches predict speech and echo, respectively, and an interaction module is designed at several intermediate feature domains between the two branches to learn the correlations between these features of the two branches. Such an interaction can leverage features learned from one branch to restore missing information or counteract undesired information of the other by calculating the similarity between these features of two branches using multi-head cross attention. Evaluation results show that the proposed DCA-Net effectively suppresses acoustic echo and noise while preserving good speech quality.
What problem does this paper attempt to address?