Dual-Branch Modeling Based on State-Space Model for Speech Enhancement

Linhui Sun,Shuo Yuan,Aifei Gong,Lei Ye,Eng Siong Chng
DOI: https://doi.org/10.1109/taslp.2024.3362691
2024-01-01
Abstract:Traditional time-frequency domain speech enhancement methods either only enhance the amplitude spectral features without changing the phase that contributes to the naturalness, intelligibility and harmonic structure, or improve the estimation of the complex spectral features including the real and imaginary components, which limits the accuracy of amplitude and phase estimation. To address this issue, we propose a joint dual-branch structured state-space model that leverages the strengths of both branches while keeping computational complexity low. Specifically, we introduce interaction modules between the two branches to facilitate information exchange, enabling features learned from one branch to compensate for missing parts in the other. Furthermore, to reduce model complexity, we introduce the diagonal version of structured state-space sequence (S4D) model for speech feature sequence denoising in both branches. Experimental results show that our low-complexity model achieves significant improvements over previous advanced systems on VoiceBank+DEMAND and TIMIT+NOISE92 datasets.
engineering, electrical & electronic,acoustics
What problem does this paper attempt to address?