Abstract:Deep-learning based full-band speech enhancement methods have gained increasing proliferation in recent years. To balance between denoising performance and computational complexity, mainstream full-band approaches typically utilize the compressed perceptual-motivated features with relatively low frequency resolution in middle and high frequencies to recover the full-band spectrum, limiting the upper bound of speech quality. Recently, sub-band fusion based approaches have been developed, where the low-frequency and high-frequency bands are tackled separately, thus neglecting the full-band spectral pattern and cross-band dependency. This paper proposes a dual-stage full- and sub-band integration network, dubbed FSI-Net, to simultaneously leverage the coarse-grained full-band spectral pattern and the fine-grained sub-band spectral details for the full-band speech enhancement task. Concretely, in the first stage, only coarse denoising is performed using the compressed ERB-scaled spectrum to capture the global full-band spectral context, so as to decrease the computational overhead. In the second stage, because the sub-band spectral characteristics of speech vary among different frequency bands, we elaborately devise two sub-networks to refine the low-frequency and high-frequency bands separately in the complex domain. To fully capitalize on cross-band guidance, we employ a band-guided encoder to provide external knowledge for the high-frequency bands. Extensive experiments show that the proposed method consistently outperforms state-of-the-art one-stage full-band and sub-band fusion based baselines in terms of various evaluation metrics.

A two-stage full-band speech enhancement model with effective spectral compression mapping

Audio-Visual Speech Enhancement with Deep Multi-modality Fusion

Time Domain Speech Enhancement Using Self-Attention-Based Subspace Projection

S-DCCRN: Super Wide Band DCCRN with Learnable Complex Feature for Speech Enhancement

Learnable Spectral Dimension Compression Mapping for Full-Band Speech Enhancement.

DMF-Net: A decoupling-style multi-band fusion model for full-band speech enhancement

THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement

Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement.

FSI-Net: A dual-stage full- and sub-band integration network for full-band speech enhancement

SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

Speech Enhancement Using U-Net with Compressed Sensing

Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model

High Fidelity Speech Enhancement with Band-split RNN

FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement

Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments

Convolutional Recurrent MetriCGAN with Spectral Dimension Compression for Full-Band Speech Enhancement

A speech enhancement model based on noise component decomposition: Inspired by human cognitive behavior

SE Territory: Monaural Speech Enhancement Meets the Fixed Virtual Perceptual Space Mapping

A Hybrid Deep-Learning Approach for Single Channel HF-SSB Speech Enhancement

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation