Streaming Dual-Path Transformer for Speech Enhancement

Hyun-Sub Lim,S. Bae,Youngseok Kim,Lae-Hoon Kim,Keunsang Lee,Seok Wan Chae
DOI: https://doi.org/10.21437/interspeech.2023-751
2023-08-20
Abstract:Speech enhancement employing a dual-path transformer (DPT) with a dilated DenseNet-based encoder and decoder has shown state-of-the-art performance. By applying attention in both time and frequency paths, the DPT learns the long-term dependency of speech and the relationship between frequency components. However, the batch processing of the DPT, which performs attention on all past and future frames, makes it impractical for real-time applications. To satisfy the real-time requirement, we propose a streaming dual-path transformer (stDPT) with zero look-ahead structure. In the training phase, we apply masking techniques to control the context length, and in the inference phase, caching methods are utilized to preserve sequential information. Extensive experiments have been conducted to show the performance based on different context lengths, and the re-sults verify that the proposed method outperforms the current state-of-the-art speech enhancement models based on real-time processing.
Computer Science
What problem does this paper attempt to address?