Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement

Koen Oostermeijer,Qing Wang,Jun Du
DOI: https://doi.org/10.21437/interspeech.2021-668
2021-01-01
Abstract:In this paper, we describe a novel speech enhancement transformer architecture. The model uses local causal self attention, which makes it lightweight and therefore particularly well-suited for real-time speech enhancement in computation resource-limited environments. In addition, we provide several ablation studies that focus on different parts of the model and the loss function to figure out which modifications yield best improvements. Using this knowledge, we propose a final version of our architecture, that we sent in to the INTERSPEECH 2021 DNS Challenge, where it achieved competitive results, despite using only 2% of the maximally allowed computation. Furthermore, we performed experiments to compare it with with LSTM and CNN models, that had 127% and 257% more parameters, respectively. Despite this difference in model size, we achieved significant improvements on the considered speech quality and intelligibility measures.
What problem does this paper attempt to address?